我有兴趣收集有关我的Maxwell卡上L1缓存访问和未命中的信息。但是,我注意到nvprof没有列出任何与L1缓存相关的指标或事件,根据文档,确实没有更多与L1缓存相关的计算能力5.x的指标。
我想知道是否有间接间接检索这些指标的方式,或者这些指标是否会在不久的将来某个时间曝光。
我的想法是简单地使用减法来检索L1缓存未命中数:
# L1 misses = (# of L2 accesses) - (# of L2 accesses from texture cache)
然而,这种方法可能不是100%准确。
我最感兴趣的是检索L1缓存全局命中率。
[sj755@localhost vectorAdd]$ nvprof --metrics tex_cache_hit_rate --events tex0_cache_sector_misses,tex1_cache_sector_misses,tex0_cache_sector_queries,tex1_cache_sector_queries ./vectorAdd
[Vector addition of 50000 elements]
==2450== NVPROF is profiling process 2450, command: ./vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==2450== Warning: Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==2450== Profiling application: ./vectorAdd
==2450== Profiling result:
==2450== Event result:
Invocations Event Name Min Max Avg
Device "GeForce GTX 970 (0)"
Kernel: vectorAdd(float const *, float const *, float*, int)
1 tex0_cache_sector_queries 18732 18732 18732
1 tex1_cache_sector_queries 18768 18768 18768
1 tex0_cache_sector_misses 12504 12504 12504
1 tex1_cache_sector_misses 12496 12496 12496
==2450== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "GeForce GTX 970 (0)"
Kernel: vectorAdd(float const *, float const *, float*, int)
1 tex_cache_hit_rate Texture Cache Hit Rate 50.00% 50.00% 50.00%
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements)
{
C[i] = A[i] + B[i];
}
}