在python中组合多个连续的单词

时间:2017-11-02 03:37:25

标签: python

我有以下列表列表。

mycookbook= [["i", "love", "tim", "tam", "and", "chocolate", "ice", "cream"], ["cooking", 
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "and", "fruits", "are", "good", 
"for", "health"]]

我还有一个如下列表。

mylist = ["tim tam", "chocolate ice cream", "fresh vegetables and fruits"]

现在,我想找到mylist中的连续字词,并将它们组合如下以更新mycookbook

mycookbook = [["i", "love", "tim tam" "and", "chocolate ice cream"], ["cooking", "fresh vegetables", 
"is", "easy"],["fresh vegetables and fruits", "are", "good", "for", "health"]]

我目前正在使用以下两个词。

for sentence in mycookbook:
    i = 0
    while i < len(sentence) - 1:
        if sentence[i] + ' ' + sentence[i + 1] in mylist:
            sentence[i] += ' ' + sentence[i + 1]
            sentence.pop(i + 1)
        i += 1
print(mycookbook)

2 个答案:

答案 0 :(得分:2)

您需要嵌套循环,一个用于短语的起始索引,下一个用于结束索引。然后,您可以使用列表切片来获取它们之间的所有单词。

for sentence in mycookbook:
    i = 0
    while i < len(sentence):
        for j in range(i + 1, len(sentence)+1):
            phrase = ' '.join(sentence[i:j])
            if phrase in mylist:
                sentence[i:j] = [phrase]
                break
        i += 1

我们无法使用for i in range(len(sentence)),因为只要我们用短语替换切片,sentence的长度就会发生变化。

DEMO

答案 1 :(得分:0)

第一个答案更有效率,我尝试使用itertool方法:

mycookbook= [["i", "love", "tim", "tam", "and", "chocolate", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "and", "fruits", "are", "good",
"for", "health"]]
mylist = ["tim tam", "chocolate ice cream", "fresh vegetables and fruits"]


import itertools

split_list=[i.split() for i in mylist]

for item in split_list:
    for element in mycookbook:
        for iterindex in itertools.product(enumerate(element),repeat=len(item)):
            combination=list(zip(*iterindex))
            match=combination[0]
            if " ".join(combination[1])==" ".join(item):
                for index in match:
                    element[index]=" ".join(item)
replace_list=[]
for item in mycookbook:
    new=[]
    for item1 in item:
        if item1 not in new:
            new.append(item1)
    replace_list.append(new)

print(replace_list)

输出:

[['i', 'love', 'tim tam', 'and', 'chocolate ice cream'], ['cooking', 'fresh', 'vegetables', 'is', 'easy'], ['fresh vegetables and fruits', 'are', 'good', 'for', 'health']]