Question

我有一个像这样的文本文件-------

important unimportant
important unimportant
important unimportant
unimportant
unimportant
important unimportant
important unimportant   
important unimportant
unimportant
unimportant
important unimportant
important unimportant
important unimportant

在此文本文件中，我只想提取“重要”部分并将三行“重要”部分存储在一行中，并用逗号分隔。然后，我想用前面提到的数组创建一个数组。

我对Python和与文本提取有关的软件包不是很熟悉。

我不确定如何解决此问题。我非常感谢您的帮助。

Answer 1

AFAIU尝试使用：

with open('file2.txt', 'r') as f:
    l = []
    c = 0
    s = []
    for line in f.readlines() + ['']:
        if 'important ' in line:
            c += 1
            s.append('important')
        else:
            l.append(', '.join(s))
            c = 0
            s.clear()
    print(list(filter(None, l)))

输出：

['important, important, important', 'important, important, important', 'important, important, important']

Answer 2

您分享的内容不多，但很明显：

您可以以某种方式将不重要的一行告诉重要的一行；
您要读取文件的每一行
您希望将连续的“重要”结果分组在一起

循环查看文件：

with open('myfile.txt', 'r') as f:
    for line in f:
        # do something with `line`

您可以收集列表中重要的行，每当到达不重要的行或文件末尾时，如果列表中有行，请将其添加到结果中。

将所有内容放在一起：

def is_important(line):
    return 'important' in line.split()  # replace with an actual test


result = []
with open('myfile.txt', 'r') as f:
    important = []
    for line in f:
        if is_important(line):
            important.append(line)
        elif important:
            result.append(important)
            important = []
# done reading, add remaining important lines to result
if important:
    result.append(important)

print(result)

此代码适用于您的示例，只需更改is_important即可使其有意义。

请注意，示例代码将在每行的末尾包含换行符-有多种方法来摆脱它，具体取决于您要一次读取整个文件还是一次读取整个文件。应该不难发现这一点。

如果您正在寻找这些简短但难以阅读的解决方案之一：

from itertools import groupby


def is_important(line):
    return 'important' in line.split()  # replace with an actual test


result = [list(x) for c, x in groupby(open('myfile.txt', 'r').readlines(), lambda x: is_important(x)) if c]

print(result)

使用Python遍历文本文件并将一组行存储在单独的数组中

2 个答案: