Question

我的程序处理大量数据，而find函数则需要花费大量时间来执行。在某些时候，我得到一个逻辑向量，我想提取向量中的1个元素的行索引。如果不使用find函数怎么办？

这是一个演示：

temp = rand(10000000, 1);
temp1 = temp > 0.5;
temp2 = find(temp1);

但是在拥有更多数据的情况下它太慢了。有什么建议吗？

谢谢

Answer 1

Find似乎是一个非常优化的功能。我所做的是创建一个非常局限于此特定问题的mex版本。运行时间缩短了一半。：）

以下是代码：

#include <math.h>
#include <matrix.h>
#include <mex.h>

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mxLogical *in;
    double *out;
    int i, nInput, nTrues;

    // Get the number of elements of the input.
    nInput = mxGetNumberOfElements(prhs[0]);

    // Get a pointer to the logical input array.
    in = mxGetLogicals(prhs[0]);    

    // Allocate memory for the output. As we don't know the number of
    // matches, we allocate an array the same size of the input. We will
    // probably reallocate it later.
    out = mxMalloc(sizeof(double) * nInput);

    // Count the number of 'trues' and store its positions.
    for (nTrues = 0, i = 0; i < nInput; )
        if (in[i++])
            out[nTrues++] = i;

    // Reallocate the array, if necessary.
    if (nTrues < nInput)
        out = mxRealloc(out, sizeof(double) * nTrues);

    // Assign the indexes to the output array.
    plhs[0] = mxCreateDoubleMatrix(0, 0, mxREAL);
    mxSetPr(plhs[0], out);
    mxSetM(plhs[0], nTrues);
    mxSetN(plhs[0], 1);
}

只需将其保存到名为find2.c的文件中，然后使用mex find2.c进行编译。

假设：

temp = rand(10000000, 1);
temp1 = temp > 0.5;

运行时间：

tic
temp2 = find(temp1);
toc

经过的时间是0.082875秒。

tic
temp2 = find2(temp1);
toc

经过的时间是0.044330秒。

重要提示：此功能没有错误处理。假设输入始终是逻辑数组，输出是双数组。需要注意。

Answer 2

您可以尝试将计算分成小块。这不会减少你必须做的计算量，但它可能仍然会更快，因为数据适合快速缓存内存，而不是慢速主内存（或者在最坏的情况下你甚至可能交换到磁盘）。像这样：

temp = rand(10000000, 1);
n = 100000; % chunk size
for i = 1:floor(length(temp) / n)
    chunk = temp(((i-1) * n + 1):(i*n))
    temp1 = chunk > 0.5;
    temp2 = find(temp1);
    do_stuff(temp2)
end

Answer 3

您可以创建常规索引数组，然后应用逻辑索引。我没有检查它是否比find强硬更快。

示例：

Index=1:size(temp);
Found = Index(temp1);

在不使用find（）的情况下查找逻辑向量的行索引

3 个答案: