如何将3x3的卷积核心与图像相乘

时间:2018-05-18 14:00:29

标签: c c++11 sse simd intrinsics

有一个3x3的卷积核心和一个由整数值像素数组表示的图像。

卷积内核表示如下:

//compound convolutional kernels
//                                | 1, 0,  1|
// convolutional kernel H = src x | 0, 0,  0|
//                                |-1, 0, -1|

//                                | 1, 0, -1|
// convolutional kernel V = src x | 0, 0,  0|
//                                | 1, 0, -1|

卷积内核=内核H +内核V

for(int inc=0; inc<height-2; inc++)
{
    //loaded 3 lines into memory
    str1_16pxs = _mm_loadu_si128((__m128i*)(src_all_str));
    str2_16pxs = _mm_loadu_si128((__m128i*)(src2_all_str));
    str3_16pxs = _mm_loadu_si128((__m128i*)(src3_all_str));

    //packing 16bit
    str1_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16(str1_16pxs);
    str2_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16(str2_16pxs);
    str3_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16(str3_16pxs);

//---!
        //there is we make the first convolution for 8px's
        //... How ???
//---

    //summ 1st 8to16 vertical registers
    sum1_str12_vert_16pxs_pack1st_8to16  = _mm_add_epi16(str1_16pxs_pack1st_8to16,           str2_16pxs_pack1st_8to16);
    sum1_str123_vert_16pxs_pack1st_8to16 = _mm_add_epi16(sum1_str12_vert_16pxs_pack1st_8to16,str3_16pxs_pack1st_8to16);

    for(int jnc=0; jnc<(width >> 4); jnc++)
    {
        str1_16pxs_plus_8pxs = _mm_srli_si128(str1_16pxs, 8);
        str2_16pxs_plus_8pxs = _mm_srli_si128(str2_16pxs, 8);
        str3_16pxs_plus_8pxs = _mm_srli_si128(str3_16pxs, 8);

        //pack 2nd 8to16 registers (+8px's)
        str1_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16(str1_16pxs_plus_8pxs);
        str2_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16(str2_16pxs_plus_8pxs);
        str3_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16(str3_16pxs_plus_8pxs);

//---!
            //do convolution for the remaining 8px's and so on until the end of the read line
            //... How ???
//---

        //summ vertic 8to16 registers
        sum1_str12_vert_16pxs_pack2nd_8to16  = _mm_add_epi16(str1_16pxs_pack2nd_8to16,           str2_16pxs_pack2nd_8to16);
        sum1_str123_vert_16pxs_pack2nd_8to16 = _mm_add_epi16(sum1_str12_vert_16pxs_pack2nd_8to16,str3_16pxs_pack2nd_8to16);

//---!4     loading next 16 px's
        src_all_str += 16;
        src2_all_str += 16;
        src3_all_str += 16;

        //...

        _mm_store_si128((__m128i*)(dst_all_str), res);
        dst_all_str += 8;

    }//for(jnc)

}//for(inc)

1 个答案:

答案 0 :(得分:-2)

所以,示例代码:

let filteredArr = Array(NSOrderedSet(array: arr)) 

编写代码需要很长时间。 我是新人,所以很遗憾,精通CE的人没有帮助进行内乱。:(