Question

我有一个相对较小的问题，但我无法继续围绕它。我有一个文本文件，其中包含有关图表的信息，结构如下：

第一行包含节点数
空白行用于分离
关于节点的信息如下，每个块通过空行
块包含节点id一行，键入第二行，有关边缘的信息如下
有两种类型的边，向上和向下，节点类型之后的第一个数字表示“向上”边的数量，它们的ID在后面排成行（如果该数字为0，则不存在“向上”边缘，并且下一个数字表示“向下”边缘）
同样适用于“向下”边缘，它们的数量及其下面的行ID

因此，具有两个节点的样本数据是：

因此，节点1具有类型1，两个上边缘，2和3，并且没有下边缘。节点2具有类型1，零上边缘和下边缘2，1和3 节点3具有类型2，一个上边缘，1个下边缘，2。

此信息显然可由人类阅读，但我在编写解析器以获取此信息并以可用形式存储时遇到问题。

我写了一个示例代码：

f = open('C:\\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
    new = False
    left = False
    right = False
    if line == "\n":
        counter += 1
        nodes[counter] = []
        new = True
        continue
    nodes[counter].append(line.replace("\n", ""))

哪种方式可以获得每个节点的信息拆分。我想要一个像字典一样的东西，它可以保存每个ID，上下邻居（如果没有，则为False）。我想我现在可以再次解析这个节点列表并自己完成每个节点，但我想知道我是否可以修改这个循环，我必须在第一时间做得很好。

Answer 1

这就是你想要的吗？

{1: {'downs': [], 'ups': [2, 3], 'node_type': 1}, 
 2: {'downs': [1, 3], 'ups': [], 'node_type': 1}, 
 3: {'downs': [2], 'ups': [1], 'node_type': 2}}

然后是代码：

def parse_chunk(chunk):
    node_id = int(chunk[0])
    node_type = int(chunk[1])

    nb_up = int(chunk[2])
    if nb_up:
        ups = map(int, chunk[3].split())
        next_pos = 4
    else:
        ups = []
        next_pos = 3

    nb_down = int(chunk[next_pos])
    if nb_down:
        downs = map(int, chunk[next_pos+1].split())
    else:
        downs = []

    return node_id, dict(
        node_type=node_type,
        ups=ups,
        downs=downs
        )

def collect_chunks(lines):
    chunk = []
    for line in lines:
        line = line.strip()
        if line:
            chunk.append(line)
        else:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

def parse(stream):
    nb_nodes = int(stream.next().strip())
    if not nb_nodes:
        return []
    stream.next()
    return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))

def main(*args):
    with open(args[0], "r") as f:
        print parse(f)

if __name__ == "__main__":
    import sys
    main(*sys.argv[1:])

Answer 2

我会这样做，如下所示。我会在文件读取时添加一个try-catch，并使用with - 语句

读取您的文件

nodes = {}
counter = 0
with open(node_file, 'r', encoding='utf-8') as file:
     file.readline()                              # skip first line, not a node
     for line in file.readline():
         if line == "\n":
             line = file.readline()               # read next line
             counter = line[0]
             nodes[counter] = {}                  # create a nested dict per node
             line = file.readline() 
             nodes[counter]['type'] = line[0]     # add node type
             line = file.readline()
             if line[0] != '0':
                 line = file.readline()           # there are many ways
                 up_edges = line[0].split()       # you can store edges
                 nodes[counter]['up'] = up_edges  # here a list
                 line = file.readline()
             else: 
                 line = file.readline()
             if line[0] != '0':
                 line = file.readline()
                 down_edges = line[0].split()     # store down-edges as a list  
                 nodes[counter]['down'] = down_edges  
             # end of chunk/node-set, let for-loop read next line
         else:
              print("this should never happen! line: ", line[0])

这会读取每行的文件。我不确定你的数据文件，但这在你的记忆中更容易。如果内存是一个问题，在硬盘读取方面会更慢（尽管SSD会创造奇迹）

尚未测试代码，但概念很明确：）

使用Python解析图形数据文件

2 个答案: