Question

我有这些索引：

1,2,3,4,5,6,7,8,9,10,11,12,13,14,15...ect.

哪些是矩阵中节点的索引（包括对角元素）：

1
2  3
4  5  6
7  8  9  10
11 12 13 14 15
16 17 18 19 20 21
etc....

我需要从这些索引中获取i,j坐标：

1,1
2,1 2,2
3,1 3,2 3,3
4,1 4,2 4,3 4,4
5,1 5,2 5,3 5,4 5,5
6,1 6,2 6,3 6,4 6,5 6,6
etc..

当我需要计算坐标时，我只有一个索引而无法访问其他索引。

Answer 1

根本没有优化：

int j = idx;
int i = 1;

while(j > i) {
    j -= i++;
}

优化：

int i = std::ceil(std::sqrt(2 * idx + 0.25) - 0.5);
int j = idx - (i-1) * i / 2;

以下是演示：

你正在寻找我：

sumRange(1, i-1) < idx && idx <= sumRange(1, i)

当sumRange（min，max）求和min和max之间的整数时，两者都被排除在外。但既然你知道：

sumRange(1, i) = i * (i + 1) / 2

然后你有：

idx <= i * (i+1) / 2
=> 2 * idx <= i * (i+1)
=> 2 * idx <= i² + i + 1/4 - 1/4
=> 2 * idx + 1/4 <= (i + 1/2)²
=> sqrt(2 * idx + 1/4) - 1/2 <= i

Answer 2

在我的情况下（以标准C实现的CUDA内核），我使用基于零的索引（并且我想排除对角线），因此我需要进行一些调整：

// idx is still one-based
unsigned long int idx = blockIdx.x * blockDim.x + threadIdx.x + 1; // CUDA kernel launch parameters
// but the coordinates are now zero-based
unsigned long int x = ceil(sqrt((2.0 * idx) + 0.25) - 0.5);
unsigned long int y = idx - (x - 1) * x / 2 - 1;

这将导致：

[0]: (1, 0)
[1]: (2, 0)
[2]: (2, 1)
[3]: (3, 0)
[4]: (3, 1)
[5]: (3, 2)

我还重新推导了Flórez-Rueda y Moreno 2001的公式并得出：

unsigned long int x = floor(sqrt(2.0 * pos + 0.25) + 0.5);

CUDA注意：为了避免使用双精度数学，我尽了一切努力，但是CUDA中的单精度sqrt函数不够精确，无法转换更大的位置x，y坐标大约为1.21亿左右（当每个块使用1,024个线程并仅沿1个块尺寸进行索引时）。一些文章采用“校正”将结果沿特定方向碰撞，但这不可避免地会在特定点分崩离析。

如何将三角矩阵索引转换为行，列坐标？

2 个答案: