Question

我只是在学习python，所以我很感激帮助。我有一个两列数据集，第一个是唯一的id，第二个是一串项。我正在使用networkX从数据中创建一个树（见下文）。我需要知道每个级别的项目频率。例如，对于A（1,2,3,4）中的路径，每个节点的计数应为1：4,2：2,3：2和4：2。如何获取节点数？

我的数据如下：

A      1, 2, 3, 4
B      1, 2, 1, 4
C      1, 3, 4, 3
D      1, 4, 3, 2

我到目前为止的代码如下：

#create graph
G = nx.MultiGraph()

#read in strings from csv
testfile = 'C:…file.txt'

with open(testfile, "r") as f:
    line = f.readline
    f = (i for i in f if '\t' in i.rstrip())
    for line in f:
        customerID, path = line.rstrip().split("\t")
        path2 =  path.rstrip("\\").rstrip("}").split(",")
        pathInt = list()
        for x in path2:
            if x is not None:
                newx = int(x)
                pathInt.append(newx)
                print(pathInt)
        varlength = len(pathInt)
        pathTuple = tuple(pathInt)
        G.add_path([pathTuple[:i+1] for i in range(0, varlength)])

nx.draw(G)
plt.show() # display

Answer 1

首先，您可以将字符串列表转换为int元组更简洁：

pathTuple = tuple(int(x) for x in path2 )
G.add_path([path[:i+1] for i in range(0, len(path))])

为了存储计数数据，我将在defaultdict中使用defaultdict，基本上是允许双索引的数据结构，然后默认为0。

import collections
counts = collections.defaultdict(lambda:collections.defaultdict(lambda:0))

这可以用于这种访问：counts[level][node]然后我们可以通过查看每个节点在路径中的位置来计算每个节点出现的频率。

在此之后，您的代码将如下所示：

#create graph
G = nx.MultiGraph()

#read in strings from csv
testfile = 'C:…file.txt'

with open(testfile, "r") as f:
    line = f.readline
    f = (i for i in f if '\t' in i.rstrip())
    for line in f:
        customerID, path = line.rstrip().split("\t")
        path2 =  path.rstrip("\\").rstrip("}").split(",")
        pathTuple = tuple(int(x) for x in path2 )
        G.add_path([pathTuple[:i+1] for i in range(0, len(pathTuple))])

        for level, node in enumerate(path):
            counts[level][node]+=1

然后你可以这样做：

level = 0
node = 1
print 'Node', node, 'appears', counts[level][node], 'times on level', level
>>> Node 1 appears 4 times on level 0

节点频率使用networkx

1 个答案: