如何解决OpenCL 简单矩阵乘法没有返回正确的结果
我正在尝试使用 OpenCL 和 C++ 主机将 2 个方阵 (32x32) 相乘。我正在尝试从一本书(OpenCL Programming By Example - R Banger,K Bhattacharyya)和 here 的基础中重现结果。然而,结果是错误的。我检查了代码的所有部分,然后推断出“enqueueNDRangeKernel”似乎设置错误。谁能帮我解决这个瓶颈?我使用的是 NVIDIA MX250 显卡,但我猜这段代码是在集成的英特尔 GPU 上运行的。代码如下:
#pragma comment(lib,"OpenCL.lib")
#include <iostream>
#include <CL/cl.hpp>
#include <vector>
#include <chrono>
using namespace std;
int main()
{
int i;
int dim = 1024;
float* A = (float*)malloc(sizeof(float) * dim * dim);
float* B = (float*)malloc(sizeof(float) * dim * dim);
float* C = (float*)malloc(sizeof(float) * dim * dim);
for (i = 0; i < dim * dim; i++)
{
A[i] = (float)(rand() % 10);
B[i] = (float)(rand() % 10);
C[i] = 0;
}
//get all platforms (drivers)
std::vector<cl::Platform> all_platforms;
cl::Platform::get(&all_platforms);
if (all_platforms.size() == 0) {
std::cout << " No platforms found. Check OpenCL installation!\n";
exit(1);
}
cl::Platform default_platform = all_platforms[0];
std::cout << "Using platform: " << default_platform.getInfo<CL_PLATFORM_NAME>() << "\n";
//get default device of the default platform
std::vector<cl::Device> all_devices;
default_platform.getDevices(CL_DEVICE_TYPE_ALL,&all_devices);
if (all_devices.size() == 0) {
std::cout << " No devices found. Check OpenCL installation!\n";
exit(1);
}
cl::Device default_device = all_devices[0];
std::cout << "Using device: " << default_device.getInfo<CL_DEVICE_NAME>() << "\n";
cl::Context context({ default_device });
cl::Program::Sources sources;
// kernel calculates for each element C=A+B
std::string kernel_code =
" void kernel simple_add(global const float* A,global const float* B,global float* C,int dim){ "
" int iCol = get_global_id(0); "
" int iRow = get_global_id(1); "
" float result = 0.0; "
" for (int i = 0; i < dim; ++i) "
" { "
" result += A[iRow * dim + i] * B[i * dim + iCol]; "
" } "
" "
" C[iRow * dim + iCol] = result; "
" } ";
sources.push_back({ kernel_code.c_str(),kernel_code.length() });
cl::Program program(context,sources);
if (program.build({ default_device }) != CL_SUCCESS) {
std::cout << " Error building: " << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(default_device) << "\n";
exit(1);
}
// create buffers on the device
cl::Buffer buffer_A(context,CL_MEM_READ_WRITE,sizeof(int) * dim);
cl::Buffer buffer_B(context,sizeof(int) * dim);
cl::Buffer buffer_C(context,sizeof(int) * dim);
//create queue to which we will push commands for the device.
cl::CommandQueue queue(context,default_device);
//write arrays A and B to the device
queue.enqueueWriteBuffer(buffer_A,CL_TRUE,sizeof(float) * dim,A);
queue.enqueueWriteBuffer(buffer_B,B);
//run the kernel
cl::Kernel simple_add(program,"simple_add");
simple_add.setArg(0,buffer_A);
simple_add.setArg(1,buffer_B);
simple_add.setArg(2,buffer_C);
simple_add.setArg(3,dim);
cl::NDRange global(32,32);
queue.enqueueNDRangeKernel(simple_add,cl::NullRange,global,cl::NullRange);
queue.finish();
// float C[10];
//read result C from the device to array C
queue.enqueueReadBuffer(buffer_C,C);
for (int i = 0; i < dim; i++) {
std::cout << C[i] << " ";
C[i] = 0.0;
}
}
解决方法
我终于找到了错误。问题是我传递了dim = 1024,即线性化矩阵(向量)的长度。它应该是 32。以防万一它在将来的某个时候帮助某人。感谢所有观看过尝试回答的人。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。