Question

我想将两行文字合并为一行，但只有当它们都不是空行时才会。例如：

1:1 Bob drives his car.
1:2 Bob and his wife are going on a trip. 
They will have an awesome time on the beach.

我想将它们放入这样的字典中：

dict[1:1] gives me "Bob drives his car."
and dict[1:2] must give me "Bob and his wife are going on a trip.They will have an awesome time on the beach."

我知道如何解决第一个问题（dict[1:1]），但我不知道如何将两个句子放在一起。

或者是否有一个选项，如果一个句子后跟另一个句子，你可以把它们放在一行上？这只是一个实例，该文件包含100000行。

Answer 1

您可以这样做 - 从文件中一次读取一行，如果有空行则触发新部分的开头。

start_new_section = True
key = None
output = {}
with open('file.txt', 'r') as f:
    for line in f:
        if line == '':
            start_new_section = True
        elif start_new_section:
            words = line.split(' ')
            key = words[0]
            output[key] = ' '.join(words[1:])
            start_new_section = False
        else:
            output[key] += line

print(output)

或同一想法的更整洁的版本：

key = None
output = {}
with open('file.txt', 'r') as f:
    for line in f:
        if not line:
            key = None
        elif key:
            output[key] += line
        else:
            key, _, output[key] = line.partition(' ')

Answer 2

解决此问题的一种可能方法是浏览文件一次，并制作一个以数值开头的索引列表。然后你可以使用索引来创建你的字典，因为你知道索引中的每2个数字都包含一个应插入字典的项目。

Answer 3

假设文件足够小以至于您可以将整个内容读入内存，则可以使用正则表达式来解析块。这是example in action。

import re

with open('file.txt', 'r') as f:
    txt = f.read()

matches = re.findall(r'^(\d+:\d+) (.+?)$(?=(?:\s^\d+:\d+)|\z)', txt, flags=re.M | re.S)
d = {m[0]: m[1].replace(r'\n', '') for m in matches}

如果它们不为空，则连接两行

3 个答案: