我有一个相对较小的问题,但我无法继续围绕它。我有一个文本文件,其中包含有关图表的信息,结构如下:
因此,具有两个节点的样本数据是:
3
1
1
2
2 3
0
2
1
0
2
1 3
3
2
1
1
1
2
因此,节点1具有类型1,两个上边缘,2和3,并且没有下边缘。 节点2具有类型1,零上边缘和下边缘2,1和3 节点3具有类型2,一个上边缘,1个下边缘,2。
此信息显然可由人类阅读,但我在编写解析器以获取此信息并以可用形式存储时遇到问题。
我写了一个示例代码:
f = open('C:\\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
new = False
left = False
right = False
if line == "\n":
counter += 1
nodes[counter] = []
new = True
continue
nodes[counter].append(line.replace("\n", ""))
哪种方式可以获得每个节点的信息拆分。我想要一个像字典一样的东西,它可以保存每个ID,上下邻居(如果没有,则为False)。我想我现在可以再次解析这个节点列表并自己完成每个节点,但我想知道我是否可以修改这个循环,我必须在第一时间做得很好。
答案 0 :(得分:2)
这就是你想要的吗?
{1: {'downs': [], 'ups': [2, 3], 'node_type': 1},
2: {'downs': [1, 3], 'ups': [], 'node_type': 1},
3: {'downs': [2], 'ups': [1], 'node_type': 2}}
然后是代码:
def parse_chunk(chunk):
node_id = int(chunk[0])
node_type = int(chunk[1])
nb_up = int(chunk[2])
if nb_up:
ups = map(int, chunk[3].split())
next_pos = 4
else:
ups = []
next_pos = 3
nb_down = int(chunk[next_pos])
if nb_down:
downs = map(int, chunk[next_pos+1].split())
else:
downs = []
return node_id, dict(
node_type=node_type,
ups=ups,
downs=downs
)
def collect_chunks(lines):
chunk = []
for line in lines:
line = line.strip()
if line:
chunk.append(line)
else:
yield chunk
chunk = []
if chunk:
yield chunk
def parse(stream):
nb_nodes = int(stream.next().strip())
if not nb_nodes:
return []
stream.next()
return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))
def main(*args):
with open(args[0], "r") as f:
print parse(f)
if __name__ == "__main__":
import sys
main(*sys.argv[1:])
答案 1 :(得分:1)
我会这样做,如下所示。我会在文件读取时添加一个try-catch,并使用with
- 语句
nodes = {}
counter = 0
with open(node_file, 'r', encoding='utf-8') as file:
file.readline() # skip first line, not a node
for line in file.readline():
if line == "\n":
line = file.readline() # read next line
counter = line[0]
nodes[counter] = {} # create a nested dict per node
line = file.readline()
nodes[counter]['type'] = line[0] # add node type
line = file.readline()
if line[0] != '0':
line = file.readline() # there are many ways
up_edges = line[0].split() # you can store edges
nodes[counter]['up'] = up_edges # here a list
line = file.readline()
else:
line = file.readline()
if line[0] != '0':
line = file.readline()
down_edges = line[0].split() # store down-edges as a list
nodes[counter]['down'] = down_edges
# end of chunk/node-set, let for-loop read next line
else:
print("this should never happen! line: ", line[0])
这会读取每行的文件。我不确定你的数据文件,但这在你的记忆中更容易。如果内存是一个问题,在硬盘读取方面会更慢(尽管SSD会创造奇迹)
尚未测试代码,但概念很明确:)