Question

我有一个包含这样数据的文件。 '＆gt;'作为标识符。

>test1
this is line 1
hi there
>test2
this is line 3
how are you
>test3
this is line 5 and
who are you

我正在尝试创建字典

{'>test1':'this is line 1hi there','>test2':'this is line 3how are you','>test3':'this is line 5who are you'}

我已导入该文件，但我无法以这种方式执行此操作。我想删除每行末尾的换行符，以便得到一行。看不到所需的空间。任何帮助将不胜感激

这是我到目前为止所尝试的

new_dict = {}
>>> db = open("/home/ak/Desktop/python_files/smalltext.txt")

for line in db:
    if '>' in line:
        new_dict[line]=''
    else:
        new_dict[line]=new_dict[line].append(line)

Answer 1

使用您的方法将是：

new_dict = {}
>>> db = open("/home/ak/Desktop/python_files/smalltext.txt", 'r')

for line in db:
    if '>' in line:
        key = line.strip()    #Strips the newline characters
        new_dict[key]=''
    else:
        new_dict[key] += line.strip()

Answer 2

以下是使用groupby的解决方案：

from itertools import groupby

kvs=[]
with open(f_name) as f:
    for k, v in groupby((e.rstrip() for e in f), lambda s: s.startswith('>')):
        kvs.append(''.join(v) if k else '\n'.join(v))    

print {k:v for k,v in zip(kvs[0::2], kvs[1::2])}

字典：

{'>test1': 'this is line 1\n\nhi there', 
 '>test2': 'this is line 3\n\nhow are you', 
 '>test3': 'this is line 5 and\n\nwho are you'}

Answer 3

您可以使用正则表达式：

import re

di={}
pat=re.compile(r'^(>.*?)$(.*?)(?=^>|\Z)', re.S | re.M)
with open(fn) as f:
    txt=f.read()
    for k, v in ((m.group(1), m.group(2)) for m in pat.finditer(txt)):
        di[k]=v.strip()

print di       


# {'>test1': 'this is line 1\nhi there', '>test2': 'this is line 3\nhow are you', '>test3': 'this is line 5 and\nwho are you'}

在python中将文件导入为字典

3 个答案: