我如何在Python中解析文本?

时间:2009-10-31 05:26:03

标签: python regex

示例文本:

SUBJECT = 'NETHERLANDS MUSIC EPA'
CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK '

预期结果:

"
NETHERLANDS MUSIC EPA | 36 before
Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK
"

如何在Python中完成此操作?

2 个答案:

答案 0 :(得分:1)

看起来你想要像......这样的东西:

import re

x = re.compile(r'^([^\|]*?)\s*\|[^\n]*\n\s*(.*?)\s*$')

s = """NETHERLANDS MUSIC EPA | 36 before
Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK"""

mo = x.match(s)

subject, content = mo.groups()

print 'SUBJECT =', repr(subject)
print 'CONTENT =', repr(content)

根据需要发出

SUBJECT = 'NETHERLANDS MUSIC EPA'
CONTENT = "Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK"

或者您可能想要反过来(如建议的评论)?然后他们关键RE可能

y = re.compile(r'^.*SUBJECT\s*=\s*\'([^\']*)\'.*CONTENT\s*=\s*"([^"]*)"',
               re.DOTANY)

您可以类似地使用它来获取匹配对象,将主题和内容提取为其组,并根据需要对其进行格式化以便显示。

在任何一种情况下,你可能都需要调整 - 因为你没有给出精确的规格,只有一个例子!,很难可靠地概括。

答案 1 :(得分:0)

这是一个简单的解决方案。我使用的是Python 3,但我认为这段代码在2中是相同的:

>>> import re
>>> pair = re.compile("SUBJECT = '([^\n]*)'\nCONTENT = '([^\n]*)'\n", re.MULTILINE)
>>> s = """SUBJECT = 'NETHERLANDS MUSIC EPA'
... CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK '
... """
>>> m = pair.match(s)
>>> m.group(1) + "\n" + m.group(2)
"NETHERLANDS MUSIC EPA\nMichael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK "