2012年1月16日 星期一

C++ AMP 第一次接觸

這只是一個PPL的範例轉換為C++ AMP的嘗試,主要是了解C++ AMP語法,測試的結果或許不足以代表C++ AMP的長處。

 我是以Parallel Programming with Microsoft Visual C++的BasicParallelLoops範例修改與測試。

C++ AMP與PPL有3點不一樣的地方必需修改
1. C++ AMP不支援C array的讀寫,必需改用Concurrency::array或Concurrency::array_view。
2. C++ AMP的只有一個起點:parallel_for_each(...) restrict(direct3d)
3. 在parallel_for_each(...) restrict(direct3d)中所使用的函式也必需加上restrict(direct3d)的關鍵字。

修改原程式的Example02為:

void Example12(vector<double>& results, int workLoad)
{
 int num = results.size();

 array<double, 1> resAMPs(num, results);

 parallel_for_each(resAMPs.grid, [&resAMPs, workLoad](index<1> idx) restrict(direct3d) {
  
  resAMPs[idx] = DoWorkAMP(idx.get_x(), workLoad);
 });
}

修改原程式碼的DoWork:

float DoWorkAMP(int i, int workLoad) restrict(direct3d)
{
 float result = 0;

 for (int j = 1; j < workLoad + 1; ++j)
 {
  float j2 = (float)j;
  float i2 = (float)i;
  result += sqrt((9.0f * i2 * i2 + 16.0f * i * i) * j2 * j2);
 }

 return result;
}


DoWorkAMP有一個奇怪的地方,原程式碼中的運算是以double的資料格式進行,但在C++ AMP中,即使只是單純的做資料轉換,如:

double i2 = (double)i;

也會發生程式crash的意外。有可能是因為大多數的GPU並不提供double的原因,但不知這點未來是否能在編譯時提出警示。

最終的執行效能是很糟的,在Core2 Quad 2.66GHz + nVidia GeForce GT 2400 + Windows 7 x64系統上的結果


Parallel For Examples (workLoad=1000,NumberOfSteps=10)
Sequential for             : 0.19 ms
Simple parallel_for        : 0.87 ms
Simple parallel_for (AMP)  : 210.89 ms
Parallel For Examples (workLoad=100,NumberOfSteps=100)
Sequential for             : 0.14 ms
Simple parallel_for        : 0.08 ms
Simple parallel_for (AMP)  : 168.46 ms
Parallel For Examples (workLoad=10,NumberOfSteps=1000)
Sequential for             : 0.14 ms
Simple parallel_for        : 0.09 ms
Simple parallel_for (AMP)  : 172.68 ms

C++ AMP究竟能帶給商業運算多少的助益,他的長處在那裡(除了語法更為接近C++),我還看不太出來。

2 則留言:

  1. Hi, it seems that GT 2400 is a DirectX 10.1 card, not a DX11 card. C++ AMP requires DX 11 card, please see: http://blogs.msdn.com/b/nativeconcurrency/archive/2011/09/22/can-i-run-c-amp-on-my-device.aspx

    So in your case, you are not running the code on GPU but on a D3D reference device, which is an interpreter and very slow. Normally it's used for debugging purpose. See http://www.danielmoth.com/Blog/GPU-Debugging-With-VS-11.aspx

    For the double support, even the graphics card says it supports double. There are differences between drivers. Assume the hardware has double capability, WDDM 1.1 driver only supports double addition, subtraction, and multiplication. WDDM 1.2 driver extends the support to double division, integer/double conversion, RCP, and FMA.

    You can provide feedback and ask any C++ AMP related questions at http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/threads.

    回覆刪除
  2. 關於double的問題可參考,Double precision support in C++ AMP (http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/07/double-precision-support-in-c-amp.aspx)

    回覆刪除