搜索两个连续的单词并将它们组合在python中

时间:2017-11-02 01:52:50

标签: python

我有以下列表列表。

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking", 
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good", 
"for", "health"]]

我还有一个如下列表。

mylist = ["tim tam", "ice cream", "fresh vegetables"]

现在,我想找到mylist中的连续字词并将它们组合起来更新mycookbook

我目前正在执行以下操作。

for sentence in mycookbook:
    for sub in sentence:
        if sub is (mylist[0].split(" ")[0]):

但我不知道如何检测下一个单词,因为没有命令next()。请帮帮我。

7 个答案:

答案 0 :(得分:0)

你想要遍历指数,每次都尽可能地向前看。所以,像这样:

new_sentence = []
index = 0
while index < len(sentence):
    for word in mylist:
        wordlist = word.split()
        if sentence[index:][:len(wordlist)] == wordlist: # This will take the first `len(wordlist)` elements and see if it's a match
            new_sentence.append(word)
            index += len(wordlist)
            break
    else:
        new_sentence.append(sentence[index])
        index += 1

您可以在此处试用:Try it Online!

答案 1 :(得分:0)

您可以遍历原始mycookbook中的每个句子。然后,对于每个句子,从指针指向第一个单词开始。

  • 案例1:如果sentence[i] + ' ' + sentence[i+1]不在mylist,我们只需将sentence[i]添加到新句子中。

  • 案例2:如果sentence[i] + ' ' + sentence[i+1]位于mylist,则将其作为一个单词添加到新句子中,并将指针向前移动2步。

以下示例。

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good",
"for", "health"]]

mylist = ["tim tam", "ice cream", "fresh vegetables"]

mycookbook_new = []
for sentence in mycookbook:
    i = 0
    sentence_new = []
    while i < len(sentence):
        if (i == len(sentence)-1 or sentence[i] + ' ' + sentence[i+1] not in mylist):
            sentence_new.append(sentence[i]) # unchanged
            i += 1
        else:
            sentence_new.append(sentence[i] + ' ' + sentence[i+1])
            i += 2
    mycookbook_new.append(sentence_new)

print(mycookbook_new)
'''
[
  ['i', 'love', 'tim tam', 'and', 'ice cream'], 
  ['cooking', 'fresh vegetables', 'is', 'easy'], 
  ['fresh vegetables', 'are', 'good', 'for', 'health']
]
'''

答案 2 :(得分:0)

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good",
"for", "health"]]


mylist = ["tim tam", "ice cream", "fresh vegetables"]

result_cookbook = []
for cb in mycookbook:
    cook_book = []
    need_continue = False
    for index, word in enumerate(cb):
        if need_continue:
            need_continue = False
            continue 
        if index < len(cb) - 1:
            # can combine with next word
            combine_word = "{} {}".format(cb[index], cb[index+1])
            if combine_word in mylist:
                cook_book.append(combine_word)
                need_continue = True
            else:
                cook_book.append(word)
        else:
            cook_book.append(word)
    result_cookbook.append(cook_book)
print result_cookbook

答案 3 :(得分:0)

使用zip对下一个工作中的每个单词对进行迭代。如果单词对在mylist中,则将其作为单个sting追加并跳过下一次迭代。

out = []
for sentence in mycookbook:
    new_sentence = []
    skip = False
    for pairs in zip(sentence, sentence[1:]+['']):
        if skip:
            skip = False
            continue
        if ' '.join(pairs) in mylist:
            new_sentence.append(' '.join(pairs))
            skip = True
        else:
            new_sentence.append(pairs[0])
    out.append(new_sentence)

答案 4 :(得分:0)

for sentence in mycookbook:
    i = 0
    while i < len(sentence) - 2:
        for m in mylist:

            words = m.split(' ')
            if sentence[i] == words[0]:
                for j in range(1, len(words)):
                    if sentence[i + 1] != words[j]:
                        break

                    sentence[i] += ' ' + words[j]
                    sentence.pop(i + 1)
        i += 1

答案 5 :(得分:0)

更易阅读的版本分为更小的功能。

注意

  1. 解决方案根本不使用任何索引(数字)。
  2. 不使用任何stdlib函数,例如itertools.ziprange
  3. 不会改变任何对象。所有对象都是不可变的。即不使用popappend +=
  4. 可以轻松修改以从某些输入文件中读取并打印到另一个文件
  5. 如果修改为阅读&amp;写入文件,将使用最小内存,因为所有内容都不存储在列表中。即懒惰地工作。
  6. 代码

    def as_pairs(iterable):
        """
        yields two items at a time from iterable
        """
        iterator = iter(iterable)
        try:
            current_item = next(iterator)
            while True:
                next_item = next(iterator)
                yield current_item, next_item
                current_item = next_item
        except StopIteration:
            return
    
    
    def merge_pairs(pair_words, word_list):
        """
        If the pair words are part of the word_list, merges them to one
        """
        pair_map = { tuple(pair_word.split(" ")) : pair_word for pair_word in pair_words }
        for pair in as_pairs(word_list):
            if pair in pair_map:
                yield pair_map.get(pair)
            else:
                first, second = pair
                yield first
    
    def main():
        mycookbook= [
                ["i", "love", "tim", "tam", "and", "ice", "cream"], 
                ["cooking", "fresh", "vegetables", "is", "easy"], 
                ["fresh", "vegetables", "are", "good", "for", "health"]
                ]
    
        mylist = ["tim tam", "ice cream", "fresh vegetables"]
        return [ list(merge_pairs(mylist, sentence)) for sentence in mycookbook ]
    
    print(main())
    

    输出:

      

    [[&#39; i&#39;,&#39; love&#39;,&#39; tim tam&#39;,&#39; tam&#39;,&#39;和&#39; ,&#39;冰淇淋&#39;],[&#39;烹饪&#39;,   &#39;新鲜蔬菜&#39;蔬菜&#39;,&#39;],&#39;新鲜蔬菜&#39;,   &#39;蔬菜&#39;,&#39;,&#39; good&#39;,&#39; for&#39;]]

答案 6 :(得分:0)

这是一个解决方案。如果你关心性能,应该以某种方式索引mylist,这样匹配函数可以比顺序查找更好。

奖励:mylist中的条目可以包含任意数量的单词,而不仅仅是两个单词,通知添加“对健康有益”。

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good",
"for", "health"]]

mylist = ["tim tam", "ice cream", "fresh vegetables", "good for health"]

def transform(x):
    def match(i):
        for e in mylist:
            el = e.split()
            if x[i:i+len(el)] == el:
                return e, len(el)
        return x[i], 1
    i = 0
    while i < len(x):
        e, l = match(i)
        yield e
        i += l
answer = [list(transform(x)) for x in mycookbook]
print(answer)
'''
[['i', 'love', 'tim tam', 'and', 'ice cream'],
 ['cooking', 'fresh vegetables', 'is', 'easy'],
 ['fresh vegetables', 'are', 'good for health']]
'''