halide_copy_to_host需要太多时间

时间:2019-07-26 10:31:50

标签: halide

卤化物发生器将RGBA转换为HSV,并将H值更改为其他值。

实际复制到设备的时间是75ms,运行时间是430ms,但是设备到主机的时间是4.3秒

我不确定为什么从主机->设备复制126MB所需的时间要比从设备到主机的31MB花费的时间短

问题的调试日志:

   ``` 2019-07-26 15:12:37.453 5045-5045/com.example.hellohalide D/halide_native: reading bitmap info...
    2019-07-26 15:12:37.453 5045-5045/com.example.hellohalide D/halide_native: width:4032 height:1960 stride:16128
    2019-07-26 15:12:37.453 5045-5045/com.example.hellohalide E/halide_native: The selected filter is :: Cartoon
    2019-07-26 15:12:37.453 5045-5045/com.example.hellohalide D/halide_native: reading bitmap pixels...
    2019-07-26 15:12:37.454 5045-5045/com.example.hellohalide I/halide: Entering Pipeline cartoon
    2019-07-26 15:12:37.454 5045-5045/com.example.hellohalide I/halide: Target: arm-64-android-debug-openglcompute
    2019-07-26 15:12:37.454 5045-5045/com.example.hellohalide I/halide:  Input Buffer input8: buffer(0, 0x0, 0x72d4600000, 1, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:37.454 5045-5045/com.example.hellohalide I/halide:  Output Buffer curved: buffer(0, 0x0, 0x72d2600000, 0, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:37.460 5045-5045/com.example.hellohalide I/halide: Halide running on 0x72e4ccdc2a
    2019-07-26 15:12:37.460 5045-5045/com.example.hellohalide I/halide: Compute shader source for: kernel_curved_s0_y_yo___block_id_y
    2019-07-26 15:12:37.460 5045-5045/com.example.hellohalide I/halide: 

  compute shader generated:
  '''
  #version 310 es
          #extension GL_ANDROID_extension_pack_es31a : require
          float float_from_bits(int x) { return intBitsToFloat(int(x)); }
          layout(location = 0) uniform int _curved_extent_0;
          layout(location = 1) uniform int _curved_extent_1;
          layout(location = 2) uniform int _curved_min_0;
          layout(location = 3) uniform int _curved_min_1;
          layout(location = 4) uniform int _curved_stride_1;
          layout(location = 5) uniform int _input8_min_0;
          layout(location = 6) uniform int _input8_min_1;
          layout(location = 7) uniform int _input8_min_2;
          layout(location = 8) uniform int _input8_stride_1;
          layout(binding=9) buffer buffer9 { float data[]; } _curved;
          layout(binding=10) buffer buffer10 { float data[]; } _input8;
          void main()
          {
            int _curved_s0_y_yoXX_block_id_y = int(gl_WorkGroupID.y);
            int _curved_s0_x_xoXX_block_id_x = int(gl_WorkGroupID.x);
            int XX_thread_id_y = int(gl_LocalInvocationID.y);
            int XX_thread_id_x = int(gl_LocalInvocationID.x);
            int _0 = _curved_s0_y_yoXX_block_id_y * int(8);
            float _1 = float(_0);
            int _2 = _

  ```
debug logs:


    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Active Uniforms: 9
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 0 Type: 5124 Name: _input8_stride_1 location: 8
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 1 Type: 5124 Name: _input8_min_2 location: 7
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 2 Type: 5124 Name: _input8_min_1 location: 6
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 3 Type: 5124 Name: _input8_min_0 location: 5
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 4 Type: 5124 Name: _curved_stride_1 location: 4
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 5 Type: 5124 Name: _curved_min_1 location: 3
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 6 Type: 5124 Name: _curved_min_0 location: 2
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 7 Type: 5124 Name: _curved_extent_1 location: 1
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: Uniform 8 Type: 5124 Name: _curved_extent_0 location: 0
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide:     Time: 8.070062e+01 ms
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: halide_copy_to_device validating input buffer: buffer(0, 0x0, 0x72d2600000, 0, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: halide_device_malloc validating input buffer: buffer(0, 0x0, 0x72d2600000, 0, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: halide_device_malloc: target device interface 0x72e303d0f8
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: OpenGLCompute: halide_openglcompute_device_malloc (user_context: 0x0, buf: 0x72e303e238)
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide:     allocating buffer, extents: 4032x1960x4x0 strides: 4x16128x1x0 (type: uint8)
    2019-07-26 15:12:37.534 5045-5045/com.example.hellohalide I/halide: openglcompute_device_malloc: initialization completed.
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide: Allocated dev_buffer(i.e. vbo) 1
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide:     Time: 6.894481e+01 ms for malloc
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide: halide_copy_to_device validating input buffer: buffer(0, 0x0, 0x72d4600000, 1, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide: halide_device_malloc validating input buffer: buffer(0, 0x0, 0x72d4600000, 1, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide: halide_device_malloc: target device interface 0x72e303d0f8
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide: OpenGLCompute: halide_openglcompute_device_malloc (user_context: 0x0, buf: 0x72e303e1c0)
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide:     allocating buffer, extents: 4032x1960x4x0 strides: 4x16128x1x1 (type: uint8)
    2019-07-26 15:12:37.603 5045-5045/com.example.hellohalide I/halide: openglcompute_device_malloc: initialization completed.
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: Allocated dev_buffer(i.e. vbo) 2
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide:     Time: 5.314331e+01 ms for malloc
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: halide_copy_to_device 0x72e303e1c0 host is dirty
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: OGLC: halide_openglcompute_copy_to_device (user_context: 0x0, buf: 0x72e303e1c0, the_buffer:2)
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: Calling global_state.MapBufferRange(GL_ARRAY_BUFFER, 0, 126443520, GL_MAP_READ_BIT|GL_MAP_WRITE_BIT)
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: c.extent[0] = 4032
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: c.extent[1] = 1960
    2019-07-26 15:12:37.657 5045-5045/com.example.hellohalide I/halide: c.extent[0] = 4
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:   copied 126443520 bytes from 0x72d4600000 to the device.
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     Time: 7.563319e+01 ms for copy to dev
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide: OpenGLCompute: halide_openglcompute_run (user_context: 0x0, entry: kernel_curved_s0_y_yo___block_id_y, blocks: 504x245x1, threads: 8x8x1, shmem: 0, num_attributes: 0, num_coords_dim0: 0, num_coords_dim1: 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 0 int32 [0x7a800000fc0 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 1 int32 [0x7a8 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 2 int32 [0x0 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 3 int32 [0x3f0000000000 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 4 int32 [0x3f00 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 5 int32 [0x0 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 6 int32 [0x0 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 7 int32 [0x3f0000000000 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 8 int32 [0xf705181000003f00 ...] 0
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 9 uint8 [0x1 ...] 1
    2019-07-26 15:12:37.732 5045-5045/com.example.hellohalide I/halide:     args 10 uint8 [0x2 ...] 1
    2019-07-26 15:12:38.165 5045-5045/com.example.hellohalide I/halide:     Time: 4.323138e+02 ms for run
    2019-07-26 15:12:38.165 5045-5045/com.example.hellohalide I/halide: Exiting Pipeline cartoon

copy_to_host需要4.3秒

    2019-07-26 15:12:38.165 5045-5045/com.example.hellohalide I/halide: halide_copy_to_host validating input buffer: buffer(1, 0x72e303d0f8, 0x72d2600000, 2, uint8, {0, 4032, 4}, {0, 1960, 16128}, {0, 4, 1})
    2019-07-26 15:12:38.165 5045-5045/com.example.hellohalide I/halide: copy_to_host_already_locked 0x72e303e238 dev_dirty is true
    2019-07-26 15:12:38.165 5045-5045/com.example.hellohalide I/halide: OGLC: halide_openglcompute_copy_to_host (user_context: 0x0, buf: 0x72e303e238, the_buffer:1, size=31610880)
    2019-07-26 15:12:38.188 5045-5045/com.example.hellohalide I/halide: c.extent[0] = 4032
    2019-07-26 15:12:38.188 5045-5045/com.example.hellohalide I/halide: c.extent[1] = 1960
    2019-07-26 15:12:38.188 5045-5045/com.example.hellohalide I/halide: c.extent[0] = 4
    2019-07-26 15:12:42.547 5045-5045/com.example.hellohalide I/halide:   copied 31610880 bytes to the host.
    2019-07-26 15:12:42.547 5045-5045/com.example.hellohalide I/halide:     Time: 4.382263e+03 ms for copy to host
    2019-07-26 15:12:42.547 5045-5045/com.example.hellohalide D/halide_native: Time taken: 5093477 (0)```

0 个答案:

没有答案