我需要创建一个从文本生成列表的函数:
text = '^to[by, from] all ^appearances[appearance]'
list = ['to all appearances', 'to all appearance', 'by all appearances',
'by all appearance', 'from all appearances', 'from all appearance']
也就是说,括号内的值应该替换在^之后的前一个单词。我希望函数有五个参数,如下所示......
我的代码(它不起作用)
def addSubstitution(buf, substitutions, val1='[', val2=']', dsym=',', start_p="^"):
for i in range(1, len(buf), 2):
buff = []
buff.extend(buf)
if re.search('''[^{2}]+[{0}][^{1}{0}]+?[{1}]'''.format(val1, val2, start_p, buff[i]):
substrs = re.split('['+val1+']'+'|'+'['+val2+']'+'|'+dsym, buff[i])
for substr in substrs:
if substr:
buff[i] = substr
addSubstitution(buff, substitutions, val1, val2, dsym, start_p)
return
substitutions.add(''.join(buf))
pass
def getSubstitution(text, val1='[', val2=']', dsym=',', start_p="^"):
pattern = '''[^{2}]+[{0}][^{1}{0}]+?[{1}]'''.format(val1, val2, start_p)
texts = re.split(pattern,text)
opttexts = re.findall(pattern,text)
buff = []
p = iter(texts)
t = iter(opttexts)
buf = []
while True:
try:
buf.append(next(p))
buf.append(next(t))
except StopIteration:
break
substitutions = set()
addSubstitution(buf, substitutions, val1, val2, dsym, start_p)
substitutions = list(substitutions)
substitutions.sort(key=len)
return substitutions
答案 0 :(得分:1)
一种方法可能是这个(我正在跳过字符串操作代码):
text = '^to[by, from] all ^appearances[appearance]'
第1步:像这样标记text
:
tokenizedText = ['^to[by, from]', 'all', '^appearances[appearance]']
步骤2:准备一份我们需要笛卡尔积的所有单词的列表(以^开头的单词)。
combinationList = []
for word in tokenizedText:
if word[0] == '^': # split the words into a list, and add them to `combinationList`.
combinationList = [['to', 'by', 'from'], ['appearances', 'appearance']]
步骤3:使用itertools.product(...)
:
for substitution in itertools.product(*combinationList):
counter = 0
sentence = []
for word in tokenizedInput:
if word[0] == '^':
sentence.append(substitution[counter])
counter += 1
else:
sentence.append(word)
print ' '.join(sentence) # Or append this to a list if you want to return all substitutions.