我有一个这样的文本文件:
ID = 31
Ne = 5122
============
List of 104 four tuples:
1 2 12 40
2 3 4 21
.
.
51 21 41 42
ID = 34
Ne = 5122
============
List of 104 four tuples:
3 2 12 40
4 3 4 21
.
.
四元组以制表符分隔。
对于每个ID,我正在尝试创建一个字典,其ID为密钥,四元组(以列表/元组形式)作为该密钥的元素。
dict = {31: (1,2,12,40),(2,3,4,21)....., 32:(3,2,12,40), (4,3,4,21)..
我的字符串解析知识仅限于使用file.readlines()的引用对象,在'ID ='上使用str.replace()和str.split()。但必须有更好的方法。这是我的一些开端。
file = open('text.txt', 'r')
fp = file.readlines()
B = [];
for x in fp:
x.replace('\t',',')
x.replace('\n',')')
B.append(x)
答案 0 :(得分:2)
类似的东西:
ll = []
for line in fp:
tt = tuple(int(x) for x in line.split())
ll.append(tt)
将产生一个元组列表,分配给你的字典键
答案 1 :(得分:2)
Python非常适合这些东西,为什么不为它写一个5-10的衬垫呢?这就是语言的擅长之处。
$ cat test
ID = 31
Ne = 5122
============
List of 104 four tuples:
1 2 12 40
2 3 4 21
ID = 34
Ne = 5122
============
List of 104 four tuples:
3 2 12 40
4 3 4 21
data = {}
for block in open('test').read().split('ID = '):
if not block:
continue #empty line
lines = block.split('\n')
ID = int(lines[0])
tups = map(lambda y: int(y), [filter(lambda x: x, line.split('\t')) for line in lines[4:]])
data[ID] = tuple(filter(lambda x: x, tups))
print(data)
# {34: ([3, 2, 12, 40], [4, 3, 4, 21]), 31: ([1, 2, 12, 40], [2, 3, 4, 21])}
唯一烦人的是所有过滤器 - 抱歉,这只是空字符串和来自额外换行符等的结果。对于一次性小脚本,它并不重要。
答案 2 :(得分:1)
我认为这会为你解决问题:
import csv
def parse_file(filename):
"""
Parses an input data file containing tags of the form "ID = ##" (where ## is a
number) followed by rows of data. Returns a dictionary where the ID numbers
are the keys and all of the rows of data are stored as a list of tuples
associated with the key.
Args:
filename (string) name of the file you want to parse
Returns:
my_dict (dictionary) dictionary of data with ID numbers as keys
"""
my_dict = {}
with open(filename, "r") as my_file: # handles opening and closing file
rows = my_file.readlines()
for row in rows:
if "ID = " in row:
my_key = int(row.split("ID = ")[1]) # grab the ID number
my_list = [] # initialize a new data list for a new ID
elif row != "\n": # skip rows that only have newline char
try: # if this fails, we don't have a valid data line
my_list.append(tuple([int(x) for x in row.split()]))
except:
my_dict[my_key] = my_list # stores the data list
continue # repeat until done with file
return my_dict
我把它变成了一个函数,这样你就可以从任何地方获取它,只需传递文件名即可。它对文件格式做出了假设,但如果文件格式总是在这里展示给我们的,那么它应该适合你。您可以在data.txt
文件中将其称为:
a_dictionary = parse_file("data.txt")
我根据您提供给我们的数据进行了测试,在删除" ..."之后似乎工作得很好。行。
编辑:我注意到一个小错误。如上所述,它将添加一个空元组代替新行字符("\n"
),只要它出现在一条线上。要解决此问题,请将try:
和except:
子句放在此内:
elif row != "\n": # skips rows that only contain newline char
我也将此添加到上面的完整代码中。