Question

我一直在尝试编写以下函数：

def track(filepath,n1,n2)

此功能适用于具有以下格式的文件：

-BORDER-
text
-BORDER-
text
-BORDER-
text
-BORDER-

如何告诉函数在此文件路径上操作，更准确地说是在每个边框内的文本上操作？

Answer 1

以下方法会读取您的文件，并为您提供非边框线列表：

from itertools import groupby

with open('input.txt') as f_input:
    for k, g in groupby(f_input, lambda x: not x.startswith('-BORDER-')):
        if k:
            print([line.strip() for line in g])

因此，如果您的输入文件是：

-BORDER-
text
-BORDER-
text
-BORDER-
this is some text
with words 
on different lines
-BORDER-

它将显示以下输出：

['text']
['text']
['this is some text', 'with words', 'on different lines']

这是通过逐行读取您的文件，并使用Python的groupby函数对给定测试匹配的行进行分组来实现的。在这种情况下，测试是该行是否开始-BORDER-。它返回返回相同结果的所有后续行。 k是测试结果，g是匹配行的组。因此，如果测试结果为True，则表示它不是以-BORDER-开头。

接下来，由于每个行都有换行符，因此使用列表推导从每个返回的行中删除它。

如果您想计算单词（假设它们用空格分隔），那么您可以执行以下操作：

from itertools import groupby

with open('input.txt') as f_input:
    for k, g in groupby(f_input, lambda x: not x.startswith('-BORDER-')):
        if k:
            lines = list(g)
            word_count = sum(len(line.split()) for line in lines)
            print("{} words in {}".format(word_count, lines))

给你：

1 words in ['text\n']
1 words in ['text\n']
9 words in ['this is some text\n', 'with words \n', 'on different lines\n']

Answer 2

要从文本文件中检索文本，您可以执行以下操作：

with open("/your/path/to/file", 'r') as f:
    text_list = [line for line in f.readlines() if 'BORDER' not in line]

text_list将包含您要查找的所有文字行。如果需要，您可以使用.strip()

去除线条

Answer 3

编写一个生成器，用于计算检测边界线并使用groupby分隔这些块：

from itertools import groupby

BORDER = '--border--'

def count_border(lines, border):
  cnt = 0
  for line in lines:
    if line.strip() == border:
        cnt += 1
    else:
        yield cnt, line

with open('file') as lines:
    for _, block in groupby(count_border(lines, BORDER), lambda (c,_): c):
        block = [line for _, line in block]
        print(block)

在具有非常特定格式的文件上操作

3 个答案: