我想将两行文字合并为一行,但只有当它们都不是空行时才会。例如:
1:1 Bob drives his car.
1:2 Bob and his wife are going on a trip.
They will have an awesome time on the beach.
我想将它们放入这样的字典中:
dict[1:1] gives me "Bob drives his car."
and dict[1:2] must give me "Bob and his wife are going on a trip.They will have an awesome time on the beach."
我知道如何解决第一个问题(dict[1:1]
),但我不知道如何将两个句子放在一起。
或者是否有一个选项,如果一个句子后跟另一个句子,你可以把它们放在一行上?这只是一个实例,该文件包含100000行。
答案 0 :(得分:1)
您可以这样做 - 从文件中一次读取一行,如果有空行则触发新部分的开头。
start_new_section = True
key = None
output = {}
with open('file.txt', 'r') as f:
for line in f:
if line == '':
start_new_section = True
elif start_new_section:
words = line.split(' ')
key = words[0]
output[key] = ' '.join(words[1:])
start_new_section = False
else:
output[key] += line
print(output)
或同一想法的更整洁的版本:
key = None
output = {}
with open('file.txt', 'r') as f:
for line in f:
if not line:
key = None
elif key:
output[key] += line
else:
key, _, output[key] = line.partition(' ')
答案 1 :(得分:0)
解决此问题的一种可能方法是浏览文件一次,并制作一个以数值开头的索引列表。然后你可以使用索引来创建你的字典,因为你知道索引中的每2个数字都包含一个应插入字典的项目。
答案 2 :(得分:0)
假设文件足够小以至于您可以将整个内容读入内存,则可以使用正则表达式来解析块。这是example in action。
import re
with open('file.txt', 'r') as f:
txt = f.read()
matches = re.findall(r'^(\d+:\d+) (.+?)$(?=(?:\s^\d+:\d+)|\z)', txt, flags=re.M | re.S)
d = {m[0]: m[1].replace(r'\n', '') for m in matches}