C++ AMP In Visual Studio 11 For GPGPU Programming

So Visual Studio 2011 Beta is finally here, along with the snazzy new look and other “must have” niceties there is the new  AMP (Accelerated Massive Parallelism) extension to C++. I must admit this was the feature that caught my eye, having been interested in parallelism for a while with lambda expressions in c# and Haskell with its built in parallelism. However, I wanted a means of coding those solutions my self; my first thought was to go with CUDA this got me in trouble with my laptop which started to behave strangely once I had installed the SDK.

Then came OpenCL, my challenge with OpenCL is it use of a subset of the C language which I was not prepared to learn at the time. OpenCL is pretty low level and requires the programmer to do more work in terms of dynamically switching between available CPU and Capable GPU. However, one of the biggest advantages with OpenCL is that its an open source framework and its features are very mature and numerous. Microsoft does have a promissory licence  on C++ AMP but we will see what happens in the future in terms of its openness and Microsoft’s willingness to conform to industry standards.

Microsoft TPL (Task Parallel Library) was also another consideration. This library makes it relatively easy for .NET developers to add parallelism to their applications, however I have not seen any mention of running tasks on the GPU. There are lots of new features coming in .NET 4.5 so TPL GPGPU may be one of them.

Enter C++ AMP, this new extension allows developers in C++ the capability to run parallel tasks on their CPU or Enabled GPU; however, its only currently on the windows platform. In my case the use of C++ AMP was a no-brainer since I am already familiar with VC++.

Example code:

#include <amp.h>
#include <iostream>
using namespace concurrency;

void CampMethod() {
    int aCPP[] = {1, 2, 3, 4, 5};
    int bCPP[] = {6, 7, 8, 9, 10};
    int sumCPP[5] = {0, 0, 0, 0, 0};

    // Create C++ AMP objects.
    array_view a(5, aCPP);
    array_view b(5, bCPP);
    array_view sum(5, sumCPP);

        // Define the compute domain, which is the set of threads that are created.
        // Define the code to run on each thread on the accelerator.
        [=](index<1> idx) restrict(amp)
            sum[idx] = a[idx] + b[idx];

    // Print the results. The expected output is "7, 9, 11, 13, 15".
    for (int i = 0; i < 5; i++) {
        std::cout << sum[i] << "\n";

It should be noted that this code can be contained in a DLL and called from any .NET language using PInvoke or any other equivalent method. Also, a performance overhead will be incurred because of the set-up and transfer of data to the GPU, there is also a penalty for setting up and calling the DLL with PInvoke. Overall the C++ AMP extension looks like a definite contender in the sphere of parallelism.

Task Parallel Library
Using C++ AMP from a DLL