Question

s = """
ID# VALUE_1 VALUE_2
  1      0.1          300
  2   0.2             400 (11 - this text is part of C in row 2 but needs to be ignored / removed)
  3          0.9          600"""

我想将上面的字符串转换为下面的格式化字典。间距和额外文本就像故意的那样。间距＆amp;应删除/删除注明的文本。

print(my_dict)
{'1': {'VALUE_1': '0.1', 'VALUE_2': '300'}, '2': {'VALUE_1': '0.2', 'VALUE_2': '400'}, '3': {'VALUE_1': '0.9', 'VALUE_2': '600'}}

到目前为止我已尝试过：

s = """
ID# VALUE_1 VALUE_2
  1      0.1          300
  2   0.2             400 (11 - this text is part of C in row 2 but needs to be ignored / removed)
  3          0.9          600"""

#Get the columns and assign them to a variable.
columns = s.lstrip().splitlines()[0] #Print the first line of the string

dct = {}

rows = s.lstrip().splitlines()

for data in rows[1:]:
    row = data.split()
    dct[row[0]] = dict(zip(columns[1:], row[1:]))

print(dct)

这最终会输出一个丑陋的无格式字典：

{'1': {'D': '0.1', '#': '300'}, '2': {'D': '0.2', '#': '400', ' ': 'in', 'V': 'row', 'A': '2', 'L': 'but', 'U': 'needs', 'E': 'to', '_': 'be', '1': 'C', '2': 'ignored'}, '3': {'D': '0.9', '#': '600'}}

我一直无法使用当前的循环过程成功地删除row2上的空格和额外的数据块。

Answer 1

一个regex解决方案，对我来说似乎更整洁：

>>> from pprint import pprint
>>> pprint([{i[0]:{'VALUE_1': i[1], 'VALUE_2': i[2]}}
...     for i in re.findall(r'^\s*(\d+)\s+(\S+)\s+(\d+)', s, re.M)])
[{'1': {'VALUE_1': '0.1', 'VALUE_2': '300'}},
 {'2': {'VALUE_1': '0.2', 'VALUE_2': '400'}},
 {'3': {'VALUE_1': '0.9', 'VALUE_2': '600'}}]

检查regex的工作原理here

Answer 2

您的代码中存在一个小错误。

columns = s.lstrip().splitlines()[0].split()

没有列出清单。使用：

/service/local/

进行此修改后，您的代码应运行正常。

此外，即兴创作，你根本不应该使用专栏。只需用行[0]替换它。

给定这个字符串输入我怎么能得到这个给定的输出？

2 个答案: