Question

我正在尝试使用NetworkX python库读取一个pajek分区文件（换句话说，这是一个.clu文件），但我不知道该怎么做。我可以使用read_pajek方法读取pajek网络（.net格式），但找不到读取.clu文件的方法。

非常感谢！

Answer 1

.clu文件遵循以下格式：

第一行：*顶点NUMBER_OF_VERTICES
第二行：顶点0的分区
第三行：顶点1的分区

依次类推，直到将所有NUMBER_OF_VERTICES个都定义到一个分区中

从networkx（https://networkx.github.io/documentation/stable/reference/algorithms/community.html）中读取社区检测算法，networkx中的首选格式是可迭代的（即列表或元组），将每个分区中的顶点编号分组，例如：

[[0，1，2，3，4]，[5]，[6，7，8，9，10]]

这意味着第一个分区由顶点0、1、2、3和4组成。

因此，读取.clu文件是将文件转换为该结构的任务。

我在https://networkx.github.io/documentation/networkx-1.10/_modules/networkx/readwrite/pajek.html#read_pajek处获取了read_pajek函数，并将其转换为一个有效的read_pajek_clu函数（您需要从集合中导入defaultdict）。

def parse_pajek_clu(lines):
    """Parse Pajek format partition from string or iterable.
    Parameters
    ----------
    lines : string or iterable
       Data in Pajek partition format.
    Returns
    -------
    communities (generator) – Yields sets of the nodes in each community.
    See Also
    --------
    read_pajek_clu()
    """
    if isinstance(lines, str):
        lines = iter(lines.split('\n'))
    lines = iter([line.rstrip('\n') for line in lines])

    labels = []  # in the order of the file, needed for matrix
    while lines:
        try:
            l = next(lines)
        except:  # EOF
            break
        if l.lower().startswith("*vertices"):
            l, nnodes = l.split()
            communities = defaultdict(list)
            for vertice in range(int(nnodes)):
                l = next(lines)
                community = int(l)
                communities.setdefault(community, []).append(vertice)
        else:
            break

    return [ v for k,v in dict(communities).items() ]

您可以在存储库中查看工作示例：

https://github.com/joaquincabezas/networkx_pajek_util

此外，一旦有了分区，就可以使用Paul Broderson提出的类似想法来绘制它了：

how to draw communities with networkx

我希望这会有所帮助！

使用Networkx读取Pajek分区文件（.clu格式）

1 个答案: