由于换行符,无法从文件创建字典

时间:2019-04-02 20:42:33

标签: python

我有一个看起来像这样的文件:

Mother Jane
Father Bob
Friends Ricky,Jack,Brian,Jordan, \
        Ricardo,Sonia,Blake

如您所见,我在“朋友”第一行的末尾有一个换行符。当我想将此文件解析为字典时,当前代码给我一个错误。

我在网上寻找解决方案,并尝试了多种方法,但似乎无济于事。

with open('./file.txt') as f:
    content = f.readlines()

    dic = {}
    for line in content:
        line_items = line.strip().split()
        if len(line_items) <= 2:
            dic[line_items[0]] = line_items[1]
        else:
            dic[line_items[0]] = line_items[1:]

我想要一个看起来像这样的结果:

dict = {"Mother": "Jane", "Father": "Bob","Friends":[Ricky,Jack,Brian,Jordan,Ricardo,Sonia,Blake]

但是我却遇到索引错误。

3 个答案:

答案 0 :(得分:2)

以下似乎有效。它将多行收集到一条逻辑行中,然后进行处理。它还不会将整个文件读入内存。

from pprint import pprint, pformat

dic = {}
with open('./newline_file.txt') as f:
    lst = []
    for line in iter(f.readline, ''):
        line = line.strip()
        if line[-1] == '\\':  # Ends with backslash?
            lst.append(line[:-2])
            continue
        else:
            lst.append(line)
            logical_line = ''.join(lst)
            lst = []

        line_items = logical_line.split(' ')
        if len(line_items) == 2:
            if ',' in line_items[1]:
                dic[line_items[0]] = line_items[1].split(',')
            else:
                dic[line_items[0]] = line_items[1]

pprint(dic)

输出:

{'Father': 'Bob',
 'Friends': ['Ricky', 'Jack', 'Brian', 'Jordan', 'Ricardo', 'Sonia', 'Blake'],
 'Mother': 'Jane'}

答案 1 :(得分:0)

您可以使用类似的内容:

import re
with open('file.txt') as f:
    c = f.read().strip()

#cleanup line breaks where comma is the last printable character
c = re.sub(r",\s+", ",", c)

final_dict = {}
for l in c.split("\n"):
    k,v = l.split()
    if "," in v:
        final_dict[k] = [x for x in v.split(",")]
    else:
        final_dict[k] = v

print(final_dict)

输出:

{'Mother': 'Jane', 'Father': 'Bob', 'Friends': ['Ricky', 'Jack', 'Brian', 'Jordan', 'Ricardo', 'Sonia', 'Blake']}

https://github.com/mui-org/material-ui/issues/9492

答案 2 :(得分:0)

您可以累积带有连续反斜杠的行,并且仅在完成后才处理行:

dic = {}
continued = ""
for line in content:
    if "\\" in line:
        continued += line.split("\\")[0]
        continue
    key,value = (continued+line+" ").split(" ",1)
    continued = ""
    value     =  [v.strip() for v in value.strip().split(",") if v != ""]
    dic[key]  =  value[0] if len(value)==1 else value

print(dic) # {'Mother': 'Jane', 'Father': 'Bob', 'Friends': ['Ricky', 'Jack', 'Brian', 'Jordan', 'Ricardo', 'Sonia', 'Blake']}