Question

我有一个这样的文本文件：

ID = 31
Ne = 5122
============
List of 104 four tuples:
1    2    12    40
2    3    4     21
.
.
51   21   41    42   

ID = 34
Ne = 5122
============
List of 104 four tuples:
3    2    12    40
4    3    4     21
.
.

四元组以制表符分隔。

对于每个ID，我正在尝试创建一个字典，其ID为密钥，四元组（以列表/元组形式）作为该密钥的元素。

 dict = {31: (1,2,12,40),(2,3,4,21)....., 32:(3,2,12,40), (4,3,4,21)..

我的字符串解析知识仅限于使用file.readlines（）的引用对象，在'ID ='上使用str.replace（）和str.split（）。但必须有更好的方法。这是我的一些开端。

file = open('text.txt', 'r')
fp = file.readlines()
B = [];
for x in fp:
    x.replace('\t',',')
    x.replace('\n',')')
    B.append(x)

Answer 1

类似的东西：

ll = []
for line in fp:
    tt = tuple(int(x) for x in line.split())
    ll.append(tt)

将产生一个元组列表，分配给你的字典键

Answer 2

Python非常适合这些东西，为什么不为它写一个5-10的衬垫呢？这就是语言的擅长之处。

$ cat test
ID = 31
Ne = 5122
============
List of 104 four tuples:
1       2       12      40
2       3       4       21

ID = 34
Ne = 5122
============
List of 104 four tuples:
3       2       12      40
4       3       4       21


data = {}
for block in open('test').read().split('ID = '):
    if not block:
        continue #empty line
    lines = block.split('\n')
    ID = int(lines[0])
    tups = map(lambda y: int(y), [filter(lambda x: x, line.split('\t')) for line in lines[4:]])
    data[ID] = tuple(filter(lambda x: x, tups))
print(data)

# {34: ([3, 2, 12, 40], [4, 3, 4, 21]), 31: ([1, 2, 12, 40], [2, 3, 4, 21])}

唯一烦人的是所有过滤器 - 抱歉，这只是空字符串和来自额外换行符等的结果。对于一次性小脚本，它并不重要。

Answer 3

我认为这会为你解决问题：

import csv

def parse_file(filename):
  """
  Parses an input data file containing tags of the form "ID = ##" (where ## is a
  number) followed by rows of data. Returns a dictionary where the ID numbers
  are the keys and all of the rows of data are stored as a list of tuples
  associated with the key.

  Args:
    filename (string) name of the file you want to parse

  Returns:
    my_dict (dictionary) dictionary of data with ID numbers as keys

  """
  my_dict = {}
  with open(filename, "r") as my_file:  # handles opening and closing file
    rows = my_file.readlines()
    for row in rows:
      if "ID = " in row:
        my_key = int(row.split("ID = ")[1])  # grab the ID number
        my_list = []  # initialize a new data list for a new ID
      elif row != "\n":  # skip rows that only have newline char
        try:  # if this fails, we don't have a valid data line
          my_list.append(tuple([int(x) for x in row.split()]))
        except:
          my_dict[my_key] = my_list  # stores the data list
          continue  # repeat until done with file
  return my_dict

我把它变成了一个函数，这样你就可以从任何地方获取它，只需传递文件名即可。它对文件格式做出了假设，但如果文件格式总是在这里展示给我们的，那么它应该适合你。您可以在data.txt文件中将其称为：

a_dictionary = parse_file("data.txt")

我根据您提供给我们的数据进行了测试，在删除＆＃34; ...＆＃34;之后似乎工作得很好。行。

编辑：我注意到一个小错误。如上所述，它将添加一个空元组代替新行字符（"\n"），只要它出现在一条线上。要解决此问题，请将try:和except:子句放在此内：

elif row != "\n":  # skips rows that only contain newline char

我也将此添加到上面的完整代码中。

用python解析棘手的字符串

3 个答案: