蟒蛇。如何按部分解析文本?

时间:2017-08-02 10:36:24

标签: python parsing text

所以,我在这些部分中有章节和文字:

[Section1]
Some weired text in section 1

[Section2]
Some text in section 2
Some text
text

如何从其中一个部分获取文字?

3 个答案:

答案 0 :(得分:3)

import re
sections = re.split(r'\[Section\d+\]', text)

然后你可以使用列表切片获得一个部分文本。在你的情况下:

section[1] will give section 1.

答案 1 :(得分:0)

试试这个,

text="""[Section1]
Some weired text in section 1

[Section2]
Some text in section 2
Some text
text"""
print text.split('\n\n')
>>>['[Section1]\nSome weired text in section 1', '[Section2]\nSome text in section 2\nSome text\ntext']

答案 2 :(得分:0)

如图所示,此代码按顺序生成每个部分中行的字典,并按部分名称编制索引。

它逐行读取文件。当它识别出一个部分标题时,它会记下该名称。当它读取后续行时,直到它读取下一个标题,它将它们保存在sections中,作为该名称下的列表。

如果您不想要或不需要行结束,请在append语句中删除它们。

>>> import re
>>> patt = re.compile(r'^\s*\[\s*(section\d+)\s*\]\s*$', re.I)
>>> sections = {}
>>> with open('to_chew.txt') as to_chew:
...     while True:
...         line = to_chew.readline()
...         if line:
...             m = patt.match(line)
...             if m:
...                 section_name = m.groups()[0]
...                 sections[section_name] = []
...             else:
...                 sections[section_name].append(line)
...         else:
...             break
...             
>>> sections
{'Section2': ['Some text in section 2\n', 'Some text\n', 'text'], 'Section1': ['Some weired text in section 1\n', '\n']}

编辑:简化代码。

>>> import re
>>> patt = re.compile(r'^\s*\[\s*(section\d+)\s*\]\s*$', re.I)
>>> sections = defaultdict(list)
>>> with open('to_chew.txt') as to_chew:
...     for line in to_chew:
...         m = patt.match(line)
...         if m:
...             section_name = m.groups()[0]
...         else:
...             sections[section_name].append(line)
... 
>>> sections
defaultdict(<class 'list'>, {'Section1': ['Some weired text in section 1\n', '\n'], 'Section2': ['Some text in section 2\n', 'Some text\n', 'text']})