Question

我正在处理包含875713 nodes和5105039 edges的图表。使用vector<bitset<875713>> vec(875713)或array<bitset<875713>, 875713>会对我产生一个段错误。我需要计算具有路径恢复的所有对最短路径。我有哪些替代数据结构？

我发现了这个SO Thread，但它没有回答我的问题。

修改

我在阅读建议后尝试了这一点，似乎有效。谢谢大家帮助我。

vector<vector<uint>> neighboursOf; // An edge between i and j exists if
                                   // neighboursOf[i] contains j
neighboursOf.resize(nodeCount);

while (input.good())
{
    uint fromNodeId = 0;
    uint toNodeId = 0;

    getline(input, line);

    // Skip comments in the input file
    if (line.size() > 0 && line[0] == '#')
        continue;
    else
    {
        // Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
        sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);

        // Store the edge
        neighboursOf[fromNodeId].push_back(toNodeId);
    }
}

Answer 1

你的图是稀疏的，即|E| << |V|^2，所以你应该使用稀疏矩阵来表示你的邻接矩阵，或者等价地，为每个节点存储一个邻居列表（这会产生一个锯齿状阵列，像这样 -

vector<vector<int> > V (number_of_nodes);
// For each cell of V, which is a vector itself, push only the indices of adjacent nodes.
V[0].push_back(2);   // Node number 2 is a neighbor of node number 0
...
V[number_of_nodes-1].push_back(...);

这样，您的预期内存要求为O(|E| + |V|)而不是O(|V|^2)，在您的情况下，应该是大约50 MB而不是千兆MB。

这也将导致更快的Dijkstra（或任何其他最短路径算法），因为您只需要在每一步考虑节点的邻居。

Answer 2

您可以将每个节点的边缘列表存储在单个阵列中。如果每个节点的边数是可变的，则可以使用空边终止列表。这将避免许多小列表（或类似数据结构）的空间开销。结果可能如下所示：

enum {
    MAX_NODES = 875713,
    MAX_EDGES = 5105039,
};

int nodes[MAX_NODES+1];         // contains index into array edges[].
                                // index zero is reserved as null node
                                // to terminate lists.

int edges[MAX_EDGES+MAX_NODES]; // contains null terminated lists of edges.
                                // each edge occupies a single entry in the
                                // array. each list ends with a null node.
                                // there are MAX_EDGES entries and MAX_NODES
                                // lists.

[...]

/* find edges for node */
int node, edge, edge_index;
for (edge_index=nodes[node]; edges[edge_index]; edge_index++) {
    edge = edges[edge_index];
    /* do something with edge... */
}

由于您拥有大量小型数据结构，因此最大限度地减少空间开销非常重要。每个节点列表的开销只是一个整数，这远远小于例如1的整数。一个stl向量。此外，列表在内存中不断布局，这意味着任何两个列表之间不会浪费空间。对于可变大小的向量，情况并非如此。

读取任何给定节点的所有边缘将非常快，因为任何节点的边缘都会连续存储在内存中。

这种数据安排的缺点是，当您初始化数组并构造边缘列表时，您需要拥有手头节点的所有边。如果按节点排序边缘，则不会出现问题，但如果边缘是随机顺序，则不能正常工作。

Answer 3

如果我们声明如下节点：

struct{
int node_id;
vector<int> edges; //all the edges starts from this Node.
} Node;

然后所有节点都可以表示如下：

array<Node> nodes;

如何有效地存储非常大的图形空间但是有快速索引？

3 个答案: