在Maxwell上检索L1缓存指标和事件

时间:2015-01-06 22:19:27

标签: cuda profiling gpu nvvp

我有兴趣收集有关我的Maxwell卡上L1缓存访问和未命中的信息。但是,我注意到nvprof没有列出任何与L1缓存相关的指标或事件,根据文档,确实没有更多与L1缓存相关的计算能力5.x的指标。

我想知道是否有间接间接检索这些指标的方式,或者这些指标是否会在不久的将来某个时间曝光。

我的想法是简单地使用减法来检索L1缓存未命中数:

# L1 misses = (# of L2 accesses) - (# of L2 accesses from texture cache)

然而,这种方法可能不是100%准确。

我最感兴趣的是检索L1缓存全局命中率。


[sj755@localhost vectorAdd]$ nvprof --metrics tex_cache_hit_rate --events tex0_cache_sector_misses,tex1_cache_sector_misses,tex0_cache_sector_queries,tex1_cache_sector_queries ./vectorAdd
[Vector addition of 50000 elements]
==2450== NVPROF is profiling process 2450, command: ./vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==2450== Warning: Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==2450== Profiling application: ./vectorAdd
==2450== Profiling result:
==2450== Event result:
Invocations                                Event Name         Min         Max         Avg
Device "GeForce GTX 970 (0)"
    Kernel: vectorAdd(float const *, float const *, float*, int)
          1                 tex0_cache_sector_queries       18732       18732       18732
          1                 tex1_cache_sector_queries       18768       18768       18768
          1                  tex0_cache_sector_misses       12504       12504       12504
          1                  tex1_cache_sector_misses       12496       12496       12496

==2450== Metric result:
Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "GeForce GTX 970 (0)"
    Kernel: vectorAdd(float const *, float const *, float*, int)
          1                        tex_cache_hit_rate                    Texture Cache Hit Rate      50.00%      50.00%      50.00%

__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < numElements)
    {
        C[i] = A[i] + B[i];
    }
}

0 个答案:

没有答案