Question

我有一个矩阵构建问题。要构建矩阵（对于第三方包），我需要通过将double []数组传递给第三方对象来逐行执行。这是我的问题：我有一个表示图表上路径的对象列表。每个对象都是一个带有“源”的路径。 property（string）和＆＃39; destination＆＃39; property（也是字符串）。我需要构建一个一维数组，其中所有元素都是0 ，除了，其中source属性等于给定的名称。给定的名称将在路径列表中多次出现。这是我构建稀疏数组的函数：

    static double[] GetNodeSrcRow3(string nodeName)
    {
        double[] r = new double[cpaths.Count ];
        for (int i = 1; i < cpaths.Count; i++)
        {
            if (cpaths[i].src == nodeName) r[i] = 1;
        }
        return r;
    }

现在我需要使用不同的名称调用此函数大约200k次。该功能本身需要0.05到0.1秒（使用秒表计时）。你可以想象，如果我们采取0.05秒的最佳情况* 200k呼叫= 10,000秒= 2.7小时这太长了。对象＆＃39; cpaths＆＃39;包含大约200k个对象。

有人能想出一种以更快的方式实现这一目标的方法吗？

Answer 1

我看不到你的其余代码，但我怀疑大部分时间花在分配和垃圾收集所有数组上。假设cpaths的大小没有改变，您可以重用相同的数组。

private static double[] NodeSourceRow == null;
private static List<int> LastSetIndices = new List<int>();

static double[] GetNodeSrcRow3(string nodeName) {
    // create new array *only* on the first call
    NodeSourceRow = NodeSourceRow ?? new double[cpaths.Count];

    // reset all elements to 0
    foreach(int i in LastSetIndices) NodeSourceRow[i] = 0;
    LastSetIndices.Clear();

    // set the 1s
    for (int i = 1; i < cpaths.Count; i++) {
        if (cpaths[i].src == nodeName) {
            NodeSourceRow[i] = 1;
            LastSetIndices.Add(i);
        }
    }

    // tada!!
    return NodeSourceRow;
}

一个缺点是潜在的缺点是，如果您需要同时使用所有阵列，它们将始终具有相同的内容。但如果你一次只使用一个，这应该会快得多。

Answer 2

如果cpaths是正常列表，那么这不适合你的情况。你需要一个src字典来索引列表。比如Dictionary<string, List<int>>。

然后你可以用随机访问填充稀疏数组。我还建议您使用稀疏列表实现来实现有效的内存使用，而不是使用内存效率低double[]。一个好的实现是SparseAList。（David Piepgrass撰写）

在生成稀疏列表之前，您应该将cpaths列表转换为合适的字典，此步骤可能需要一段时间（最多几秒），但之后您将超快地生成稀疏列表。

public static Dictionary<string, List<int>> _dictionary;

public static void CacheIndexes()
{
    _dictionary = cpaths.Select((x, i) => new { index = i, value = x })
                        .GroupBy(x => x.value.src)
                        .ToDictionary(x => x.Key, x => x.Select(a => a.index).ToList());
}

在开始生成稀疏数组之前，你应该调用CacheIndexes。

public static double[] GetNodeSrcRow3(string nodeName)
{
    double[] r = new double[cpaths.Count];
    List<int> indexes;
    if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;

    foreach(var index in indexes) r[index] = 1;

    return r;
}

请注意，如果您使用SparseAList，它将占用非常小的空间。例如，如果double数组的长度为10K并且其中只有一个索引，那么SparseAList只有10K个项，但实际上只有一个项存储在内存中。它不难用到那个系列，我建议你试一试。

使用SparseAList

的相同代码

public static SparseAList<double> GetNodeSrcRow3(string nodeName)
{
    SparseAList<double> r = new SparseAList<double>();

    r.InsertSpace(0, cpaths.Count); // allocates zero memory.

    List<int> indexes;
    if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;

    foreach(var index in indexes) r[index] = 1;

    return r;
}

Answer 3

您可以使用TPL的Parallel.For方法来使用多线程。

static double[] GetNodeSrcRow3(string nodeName)
{
    double[] r = new double[cpaths.Count];
    Parallel.For(1, cpaths.Count, (i, state) =>
        {
            if (cpaths[i].src == nodeName) r[i] = 1;
        });
    return r;
}

Answer 4

很棒的答案！

如果我可以添加一些示例，请添加到已经很好的示例中：

year    secondhighestsale
2010    500
2011    800
2012    500

System.Numerics进行了大幅优化，还使用了硬件加速。它也是线程安全的。至少从我所读到的有关。

对于速度和可伸缩性，一小段代码可能会带来不同。

用于在c＃

4 个答案: