考虑这个文字:
您是否希望通过电子邮件回复您的问题?
我将通过标记它们来为多个单词提出多种选择:
您是否希望通过电子邮件[发送] | [有] | [对]您的问题的回复发送给您[问题] | [通过电子邮件发送给您?
选项为括号,并由管道分隔 好的选择之前是 g
我想解析这句话,以便将文本格式化为:
您希望__通过电子邮件发送__您的问题回复吗?
使用如下列表:
[
[
{"to get":0},
{"having":0},
{"to have":1},
],
[
{"up to":0},
{"to":1},
{"on":0},
],
]
我的标记设计是否正常?
如何正则表达句子以获得所需的结果并生成列表?
编辑:需要面向用户的标记语言
答案 0 :(得分:3)
我会添加一些分组括号{}
,并输出不是dicts列表的列表,而是输出dicts列表。
代码:
import re
s = 'Would you like {[to get]|[having]|g[to have]} responses to your questions sent {[up to]|g[to]|[on]} you via email ?'
def variants_to_dict(variants):
dct = {}
for is_good, s in variants:
dct[s] = 1 if is_good == 'g' else 0
return dct
def question_to_choices(s):
choices_re = re.compile(r'{[^}]+}')
variants_re = re.compile(r'''\|?(g?)
\[
([^\]]+)
\]
''', re.VERBOSE)
choices_list = []
for choices in choices_re.findall(s):
choices_list.append(variants_to_dict(variants_re.findall(choices)))
return choices_re.sub('___', s), choices_list
question, choices = question_to_choices(s)
print question
print choices
输出:
Would you like ___ responses to your questions sent ___ you via email ?
[{'to have': 1, 'to get': 0, 'having': 0}, {'to': 1, 'up to': 0, 'on': 0}]
答案 1 :(得分:2)
使用正则表达式进行粗略的解析实现:
import re
s = "Would you like [to get]|[having]|g[to have] responses to your questions sent [up to]|g[to]|[on] you via email ?" # pattern string
choice_groups = re.compile(r"((?:g?\[[^\]]+\]\|?)+)") # regex to get choice groups
choices = re.compile(r"(g?)\[([^\]]+)\]") # regex to extract choices within each group
# now, use the regexes to parse the string:
groups = choice_groups.findall(s)
# returns: ['[to get]|[having]|g[to have]', '[up to]|g[to]|[on]']
# parse each group to extract possible choices, along with if they are good
group_choices = [choices.findall(group) for group in groups]
# will contain [[('', 'to get'), ('', 'having'), ('g', 'to have')], [('', 'up to'), ('g', 'to'), ('', 'on')]]
# finally, substitute each choice group to form a template
template = choice_groups.sub('___', s)
# template is "Would you like ___ responses to your questions sent ___ you via email ?"
解析这个以适合您的格式现在应该很容易。祝你好运:)
答案 2 :(得分:2)
我也会建议我的解决方案:
您是否希望{得到}对您的问题做出回复? 通过电子邮件发送{最多| +到|开}?
def extract_choices(text):
choices = []
def callback(match):
variants = match.group().strip('{}')
choices.append(dict(
(v.lstrip('+'), v.startswith('+'))
for v in variants.split('|')
))
return '___'
text = re.sub('{.*?}', callback, text)
return text, choices
让我们尝试一下:
>>> t = 'Would you like {to get|having|+to have} responses to your questions sent {up to|+to|on} you via email?'
>>> pprint.pprint(extract_choices(t))
... ('Would you like ___ responses to your questions sent ___ you via email?',
... [{'having': False, 'to get': False, 'to have': True},
... {'on': False, 'to': True, 'up to': False}])
答案 3 :(得分:1)
我还认为,对于这个任务,xml更合适,因为已经有很多工具可以使解析更容易,更不容易出错。
无论如何,如果你决定使用你的设计,我会做这样的事情:
import re
question_str = ("Would you like [to get]|[having]|g[to have] "
"responses to your questions sent "
"[up to]|g[to]|[on] you via email ?")
def option_to_dict(option_str):
if option_str.startswith('g'):
name = option_str.lstrip('g')
value = 1
else:
name = option_str
value = 0
name = name.strip('[]')
return {name: value}
regex = re.compile('g?\[[^]]+\](\|g?\[[^]]+\])*')
options = [[option_to_dict(option_str)
for option_str in match.group(0).split('|')]
for match in regex.finditer(question_str)]
print options
question = regex.sub('___', question_str)
print question
示例输出:
[[{'to get': 0}, {'having': 0}, {'to have': 1}], [{'up to': 0}, {'to': 1}, {'on': 0}]]
Would you like ___ responses to your questions sent ___ you via email ?
注意:关于设计,我认为最好设置一个标记来设置整个选项的开始/结束(不只是一个选项)。