Question

我有一个文本文件test.txt，其中包含以下数据：

content content
more content
content conclusion
==========
content again
more of it
content conclusion
==========
content
content
contend done
==========

我想获得==========分隔的块列表。

对于上面的例子，我期待这样的事情：

foo = ["content content\more content\content conclusion",
       "content again\more of it\content conclusion",
       "content\content\contend done"]

另外，如果有人可以分享执行此操作的一般流程（如果有的话），我将不胜感激。

灵感来自：Splitting large text file on every blank line

Answer 1

y="""content content
more content
content conclusion
==========
content again
more of it
content conclusion
==========
content
content
contend done
=========="""
x=re.compile(r"(?:^|(?<=={10}))\n*([\s\S]+?)\n*(?=={10}|$)")
print re.findall(x, y)

输出：

['content content\nmore content\ncontent conclusion', 'content again\nmore of it\ncontent conclusion', 'content\ncontent\ncontend done']

Answer 2

您可以使用正则表达式根据3个或更多=字符拆分文件。然后用反斜杠替换新行：

import re

with open(file_name) as f:
    my_list = [chunk.strip().replace('\n', '\\') for chunk in re.split(r'={3,}', f.read())]

如果你知道等号的确切长度，你可以使用字符串拆分方法：

N = 5 # this is an example
with open(file_name) as f:
    my_list = [chunk.strip().replace('\n', '\\') for chunk in f.read().split('=' * N)]

另请注意，反斜杠用于转义字符，如果您在字符串中使用它们，它将转义下一个字符，这意味着如果您特殊字符不会被解释为其原始含义。

因此，最好将这些行与另一个分隔符分开：

N = 5 # this is an example
with open(file_name) as f:
    my_list = [chunk.strip().strip().replace('\n', '/') for chunk in f.read().split('=' * N)]

Answer 3

使用split方法。

with open('file.txt') as f:
    data = f.read()
print(data.split('=========='))

拆分由特殊字符分隔的文本文件

3 个答案: