如何在Python中解析以下文件的部分?

时间:2014-05-27 11:48:51

标签: python parsing python-3.x

我有一个这样的文件:

Hi:
    fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
    fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
    Exampples:

    >>fdsfds
    >>ok

    This is it.

Hello:
    fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
    fdsfdsfdsfdsfds
    fdsfdsfsd

Hi的部分来自fds...This is it. Hello的部分来自fds..fds.. 我想只得到所有标题的部分。我想到了以下方法:

Start from :然后查看\n\n,它将分别给我这一部分。但这不是因为该部分本身可以具有相同的格式。我不想使用regexConfigparser执行此操作。我正在寻找简单的解析。如何解决这个问题?

2 个答案:

答案 0 :(得分:0)

您可以搜索不以五个空格开头的行:

tab = "     " # five spaces
with open('input.txt', 'r') as f:
    for line in f:
        if line.startswith(tab):
            print line

答案 1 :(得分:0)

使用正则表达式非常简单:

txt='''\
Hi:
    fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
    fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
    Exampples:

    >>fdsfds
    >>ok

    This is it.

Hello:
    fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
    fdsfdsfdsfdsfds
    fdsfdsfsd'''

import re

print(re.findall(r'^(\w+:.*?)(?=^\w+:|\Z)', txt, re.S | re.M))  

打印:

['Hi:\n    fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds\n    fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds\n    Exampples:\n\n    >>fdsfds\n    >>ok\n\n    This is it.\n\n', 'Hello:\n    fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd\n    fdsfdsfdsfdsfds\n    fdsfdsfsd']