使用Python解析和分组字符串中的文本

时间:2010-11-11 16:08:14

标签: python parsing pyparsing

我需要解析一系列由3个部分组成的短字符串:一个问题和两个可能的答案。该字符串将采用一致的格式:

这是一个问题“answer_option_1在引号中”“answer_option_2在引号中”

我需要确定问题部分以及单引号或双引号中的两个可能的答案选择。

例: 今天的天空是什么颜色的? “蓝色”或“灰色”
谁将赢得比赛'密歇根''俄亥俄州立大学'

我如何在python中执行此操作?

4 个答案:

答案 0 :(得分:1)

>>> import re
>>> s = "Who will win the game 'Michigan' 'Ohio State'"
>>> re.match(r'(.+)\s+([\'"])(.+?)\2\s+([\'"])(.+?)\4', s).groups()
('Who will win the game', "'", 'Michigan', "'", 'Ohio State')

答案 1 :(得分:1)

如果您的格式很简单(例如 not ,则不需要正则表达式)。只需split行:

>>> line = 'What color is the sky today? "blue" "grey"'.strip('"')
>>> questions, answers = line.split('"', 1)
>>> answer1, answer2 = answers.split('" "')
>>> questions
'What color is the sky today? '
>>> answer1
'blue'
>>> answer2
'grey'

答案 2 :(得分:0)

一种可能性是你可以使用正则表达式。

import re
robj = re.compile(r'^(.*) [\"\'](.*)[\"\'].*[\"\'](.*)[\"\']')
str1 = "Who will win the game 'Michigan' 'Ohio State'"
r1 = robj.match(str1)
print r1.groups()
str2 = 'What color is the sky today? "blue" or "grey"'
r2 = robj.match(str2)
r2.groups()

输出:

('Who will win the game', 'Michigan', 'Ohio State')
('What color is the sky today?', 'blue', 'grey')

答案 3 :(得分:0)

Pyparsing将为您提供一个能够适应输入文本中某些变化的解决方案:

questions = """\
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'""".splitlines()

from pyparsing import *

quotedString.setParseAction(removeQuotes)
q_and_a = SkipTo(quotedString)("Q") + delimitedList(quotedString, Optional("or"))("A")

for qn in questions:
    print qn
    qa = q_and_a.parseString(qn)
    print "qa.Q", qa.Q
    print "qa.A", qa.A
    print

将打印:

What color is the sky today? "blue" or "grey"
qa.Q What color is the sky today? 
qa.A ['blue', 'grey']

Who will win the game 'Michigan' 'Ohio State'
qa.Q Who will win the game 
qa.A ['Michigan', 'Ohio State']