如何使用 Profiling+openCL+Sycl+DPCPP 测量 GPU 的执行时间

如何解决如何使用 Profiling+openCL+Sycl+DPCPP 测量 GPU 的执行时间

我读过这个链接 https://github.com/intel/pti-gpu

并且我尝试使用 OpenCL(TM) 的设备活动跟踪，但我很困惑，我不知道我应该如何使用设备活动文档来测量加速器上的时间。为了测量 cpu 的性能，我使用了 chrono，但我对使用分析来测量不同设备中 cpu 和 GPU 的性能很感兴趣。我的程序：

    #include <CL/sycl.hpp>
#include <iostream>
#include <tbb/tbb.h>
#include <tbb/parallel_for.h>
#include <vector>
#include <string>
#include <queue>
#include<tbb/blocked_range.h>
#include <tbb/global_control.h>
#include <chrono>


using namespace tbb;

template<class Tin,class Tout,class Function>
class Map {
private:
    Function fun;
public:
    Map() {}
    Map(Function f):fun(f) {}


    std::vector<Tout> operator()(bool use_tbb,std::vector<Tin>& v) {
        std::vector<Tout> r(v.size());
        if(use_tbb){
            // Start measuring time
            auto begin = std::chrono::high_resolution_clock::Now();
            tbb::parallel_for(tbb::blocked_range<Tin>(0,v.size()),[&](tbb::blocked_range<Tin> t) {
                    for (int index = t.begin(); index < t.end(); ++index){
                        r[index] = fun(v[index]);
                    }
            });
            // Stop measuring time and calculate the elapsed time
            auto end = std::chrono::high_resolution_clock::Now();
            auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
            printf("Time measured: %.3f seconds.\n",elapsed.count() * 1e-9);
            return r;
         } else {
                sycl::queue gpuQueue{sycl::gpu_selector()};
                sycl::range<1> n_item{v.size()};
                sycl::buffer<Tin,1> in_buffer(&v[0],n_item);
                sycl::buffer<Tout,1> out_buffer(&r[0],n_item);
                gpuQueue.submit([&](sycl::handler& h){
                    //local copy of fun
                    auto f = fun;
                    sycl::accessor in_accessor(in_buffer,h,sycl::read_only);
                    sycl::accessor out_accessor(out_buffer,sycl::write_only);
                    h.parallel_for(n_item,[=](sycl::id<1> index) {
                        out_accessor[index] = f(in_accessor[index]);
                    });
                }).wait();
         }
                return r;
    }
};

template<class Tin,class Function>
Map<Tin,Tout,Function> make_map(Function f) { return Map<Tin,Function>(f);}


typedef int(*func)(int x);
//define different functions
auto function = [](int x){ return x; };
auto functionTimesTwo = [](int x){ return (x*2); };
auto functionDivideByTwo = [](int x){ return (x/2); };
auto lambdaFunction = [](int x){return (++x);};


int main(int argc,char *argv[]) {

    std::vector<int> v = {1,2,3,4,5,6,7,8,9};
    //auto f = [](int x){return (++x);};
    //Array of functions
    func functions[] =
        {
            function,functionTimesTwo,functionDivideByTwo,lambdaFunction
        };

    for(int i = 0; i< sizeof(functions); i++){
        auto m1 = make_map<int,int>(functions[i]);

    //auto m1 = make_map<int,int>(f);
    std::vector<int> r = m1(true,v);
    //print the result
    for(auto &e:r) {
        std::cout << e << " ";
        }
    }


  return 0;
}

解决方法

首先，SYCL Kernel 不支持函数指针。因此，您可以相应地更改代码。

在 GPU 中实现分析的一种方法是按照以下步骤操作： 1.为目标设备的命令队列启用分析模式 2.为目标设备活动引入事件 3.设置活动完成时通知的回调 4.读取回调里面的分析数据

基本上，您需要在回调中使用 CL_PROFILING_COMMAND_START 和 CL_PROFILING_COMMAND_END（由设备上的事件开始和结束执行标识的命令）。

您可以在此处找到详细步骤 https://github.com/intel/pti-gpu/blob/master/chapters/device_activity_tracing/OpenCL.md

我还建议您使用设备活动跟踪检查 pti-gpu 的示例。检查相同的 URL https://github.com/intel/pti-gpu/tree/master/samples