我有一个这样的文件:
Hi:
fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
Exampples:
>>fdsfds
>>ok
This is it.
Hello:
fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
fdsfdsfdsfdsfds
fdsfdsfsd
Hi
的部分来自fds...
到This is it.
Hello
的部分来自fds..
到fds..
我想只得到所有标题的部分。我想到了以下方法:
Start from :
然后查看\n\n
,它将分别给我这一部分。但这不是因为该部分本身可以具有相同的格式。我不想使用regex
或Configparser
执行此操作。我正在寻找简单的解析。如何解决这个问题?
答案 0 :(得分:0)
您可以搜索不以五个空格开头的行:
tab = " " # five spaces
with open('input.txt', 'r') as f:
for line in f:
if line.startswith(tab):
print line
答案 1 :(得分:0)
使用正则表达式非常简单:
txt='''\
Hi:
fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
Exampples:
>>fdsfds
>>ok
This is it.
Hello:
fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
fdsfdsfdsfdsfds
fdsfdsfsd'''
import re
print(re.findall(r'^(\w+:.*?)(?=^\w+:|\Z)', txt, re.S | re.M))
打印:
['Hi:\n fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds\n fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds\n Exampples:\n\n >>fdsfds\n >>ok\n\n This is it.\n\n', 'Hello:\n fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd\n fdsfdsfdsfdsfds\n fdsfdsfsd']