如何将我的计算 C 代码与 OpenMP 集成

如何解决如何将我的计算 C 代码与 OpenMP 集成

我使用的是带数据流模式的双通道 DAQ 卡。我写了一些分析/计算的代码，放到主代码中进行操作。但是，FIFO 溢出警告标志总是在其总数据达到 6000 MSamples 左右（DAQ 板载内存为 8GB）时出现。我很清楚，复杂的计算可能会延迟系统并导致溢出，但我写的所有作品都是我的实验所必需的，这意味着无法替换（或者有更有效的代码可以让我得到相同的结果）。我听说 OpenMP 可能是一种提高速度的解决方案，但我只是 C 的初学者，我该如何实现我的计算代码？

我的电脑有 64GB 内存和英特尔酷睿 i7 处理器。在运行数据流代码时，我总是关闭其他不必要的软件。代码已尽可能优化，例如简化 hilbert() 并使用 memcpy 挑选特定范围的数据点。

这是我处理数据的方式：

1.安装 Hilbert 变换的 FFTW 源代码。 2.For 循环将 pi16Buffer 数据去交错到 ch2Buffer 3.memcpy 获取我感兴趣的特定范围的数据，将它们放入另一个名为 ch2newBuffer 的数组中 4.在hilbert()上做ch2newBuffer并计算它的绝对数。 5.找出ch1和abs(hilbert(ch2newBuffer))的最大值。 6.计算max(abs(hilbert(ch2))) / max(ch1)。

这是我负责计算的数据采集代码的一部分：

void hilbert(const int16* in,fftw_complex* out,fftw_plan plan_forward,fftw_plan plan_backward)
{
    // copy the data to the complex array
    for (int i = 0; i < N; ++i) {
        out[i][REAL] = in[i];
        out[i][IMAG] = 0;
    }
    // creat a DFT plan and execute it
    //fftw_plan plan = fftw_plan_dft_1d(N,out,FFTW_FORWARD,FFTW_ESTIMATE);
    fftw_execute(plan_forward);
    // destroy a plan to prevent memory leak
    //fftw_destroy_plan(plan_forward);
    int hN = N>>1; // half of the length (N/2)
    int numRem = hN; // the number of remaining elements
    // multiply the appropriate value by 2
    //(those should multiplied by 1 are left intact because they wouldn't change)
    for (int i = 1; i < hN; ++i) {
        out[i][REAL] *= 2;
        out[i][IMAG] *= 2;
    }
    // if the length is even,the number of the remaining elements decrease by 1
    if (N % 2 == 0)
        numRem--;
    else if (N > 1) {
        out[hN][REAL] *= 2;
        out[hN][IMAG] *= 2;
    }
    // set the remaining value to 0
    // (multiplying by 0 gives 0,so we don't care about the multiplicands)
    memset(&out[hN + 1][REAL],numRem * sizeof(fftw_complex));
    // creat a IDFT plan and execute it
    //plan = fftw_plan_dft_1d(N,FFTW_BACKWARD,FFTW_ESTIMATE);
    fftw_execute(plan_backward);
    // do some cleaning
    //fftw_destroy_plan(plan_backward);
    //fftw_cleanup();
    // scale the IDFT output
    //for (int i = 0; i < N; ++i) {
        //out[i][REAL] /= N;
        //out[i][IMAG] /= N;
    //}
}


float SumBufferData(void* pBuffer,uInt32 u32Size,uInt32 u32SampleBits)
{
    // In this routine we sum up all the samples in the buffer. This function 
    // should be replaced with the user's analysys function


    if ( 8 == u32SampleBits )
    {
        pu8Buffer = (uInt8 *)pBuffer;
        for (i = 0; i < u32Size; i++)
        {
            i64Sum += pu8Buffer[i];
        }
    }
    else
    {
        pi16Buffer = (int16 *)pBuffer;
        
        
        
        fftw_complex(hilbertedch2[N]);
        fftw_plan plan_forward = fftw_plan_dft_1d(N,hilbertedch2,FFTW_ESTIMATE);
        fftw_plan plan_backward = fftw_plan_dft_1d(N,FFTW_ESTIMATE);
    
    
        
        ch2Buffer = (int16*)calloc(u32Size / 2,sizeof * ch2Buffer);
        ch2newBuffer= (int16*)calloc(u32Size/2,sizeof* ch2newBuffer);
        

        // De-interleave the data from pi16Buffer
        for (i = 0; i < u32Size/2 ; i++)
        {
            ch2Buffer[i] = pi16Buffer[i*2+1];

        }

        // Pick out the data points range that we are interested
        memcpy(ch2newBuffer,&ch2Buffer[6944],1024 * sizeof(ch2Buffer[0]));
        // Do the hilbert transform to these data points
        hilbert(ch2newBuffer,plan_forward,plan_backward);
        fftw_destroy_plan(plan_forward);
        fftw_destroy_plan(plan_backward);

        
        //Find max value in each segs of ch1 and ch2
        for (i = 128; i < 200 ; i++)
        {
            if (pi16Buffer[i*2] > max1)
                max1 = pi16Buffer[i*2];
        }

        for (i = 0; i < 1024; i++)
        {
            if (fabs(hilbertedch2[i][IMAG]) > max2)
                max2 = fabs(hilbertedch2[i][IMAG]);
        }
        
        Corrected = max2 / max1 / N; // Calculate the signal correction

        
        
    
        
        
    }
    free(ch2Buffer);
    free(ch2newBuffer);
    return Corrected;
}

解决方法

循环通常是并行的良好开端，例如：

#pragma omp parallel for
for (int i = 0; i < N; ++i) {
    out[i][REAL] = in[i];
    out[i][IMAG] = 0;
}

或

#pragma omp parallel for reduction(max:max2)
for (i = 0; i < 1024; i++)
{
     float tmp = fabs(hilbertedch2[i][IMAG]);
     max2 = (max2 > tmp) ? max2 : tmp.
}

话虽如此，您需要分析您的代码，找出执行花费最多时间的地方，并尽可能尝试并行化。但是，查看您发布的内容，我看不到很多并行机会。

如何将我的计算 C 代码与 OpenMP 集成

如何解决如何将我的计算 C 代码与 OpenMP 集成

解决方法

相关推荐