Question

我有以下列表列表。

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking", 
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good", 
"for", "health"]]

我还有一个如下列表。

mylist = ["tim tam", "ice cream", "fresh vegetables"]

现在，我想找到mylist中的连续字词并将它们组合起来更新mycookbook。

我目前正在执行以下操作。

for sentence in mycookbook:
    for sub in sentence:
        if sub is (mylist[0].split(" ")[0]):

但我不知道如何检测下一个单词，因为没有命令next()。请帮帮我。

Answer 1

你想要遍历指数，每次都尽可能地向前看。所以，像这样：

new_sentence = []
index = 0
while index < len(sentence):
    for word in mylist:
        wordlist = word.split()
        if sentence[index:][:len(wordlist)] == wordlist: # This will take the first `len(wordlist)` elements and see if it's a match
            new_sentence.append(word)
            index += len(wordlist)
            break
    else:
        new_sentence.append(sentence[index])
        index += 1

您可以在此处试用：Try it Online!

Answer 2

您可以遍历原始mycookbook中的每个句子。然后，对于每个句子，从指针指向第一个单词开始。

案例1：如果sentence[i] + ' ' + sentence[i+1]不在mylist，我们只需将sentence[i]添加到新句子中。
案例2：如果sentence[i] + ' ' + sentence[i+1]位于mylist，则将其作为一个单词添加到新句子中，并将指针向前移动2步。

以下示例。

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good",
"for", "health"]]

mylist = ["tim tam", "ice cream", "fresh vegetables"]

mycookbook_new = []
for sentence in mycookbook:
    i = 0
    sentence_new = []
    while i < len(sentence):
        if (i == len(sentence)-1 or sentence[i] + ' ' + sentence[i+1] not in mylist):
            sentence_new.append(sentence[i]) # unchanged
            i += 1
        else:
            sentence_new.append(sentence[i] + ' ' + sentence[i+1])
            i += 2
    mycookbook_new.append(sentence_new)

print(mycookbook_new)
'''
[
  ['i', 'love', 'tim tam', 'and', 'ice cream'], 
  ['cooking', 'fresh vegetables', 'is', 'easy'], 
  ['fresh vegetables', 'are', 'good', 'for', 'health']
]
'''

Answer 3

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good",
"for", "health"]]


mylist = ["tim tam", "ice cream", "fresh vegetables"]

result_cookbook = []
for cb in mycookbook:
    cook_book = []
    need_continue = False
    for index, word in enumerate(cb):
        if need_continue:
            need_continue = False
            continue 
        if index < len(cb) - 1:
            # can combine with next word
            combine_word = "{} {}".format(cb[index], cb[index+1])
            if combine_word in mylist:
                cook_book.append(combine_word)
                need_continue = True
            else:
                cook_book.append(word)
        else:
            cook_book.append(word)
    result_cookbook.append(cook_book)
print result_cookbook

Answer 4

使用zip对下一个工作中的每个单词对进行迭代。如果单词对在mylist中，则将其作为单个sting追加并跳过下一次迭代。

out = []
for sentence in mycookbook:
    new_sentence = []
    skip = False
    for pairs in zip(sentence, sentence[1:]+['']):
        if skip:
            skip = False
            continue
        if ' '.join(pairs) in mylist:
            new_sentence.append(' '.join(pairs))
            skip = True
        else:
            new_sentence.append(pairs[0])
    out.append(new_sentence)

Answer 5

for sentence in mycookbook:
    i = 0
    while i < len(sentence) - 2:
        for m in mylist:

            words = m.split(' ')
            if sentence[i] == words[0]:
                for j in range(1, len(words)):
                    if sentence[i + 1] != words[j]:
                        break

                    sentence[i] += ' ' + words[j]
                    sentence.pop(i + 1)
        i += 1

Answer 6

更易阅读的版本分为更小的功能。

注意

解决方案根本不使用任何索引（数字）。
不使用任何stdlib函数，例如itertools.zip或range
不会改变任何对象。所有对象都是不可变的。即不使用pop，append +=
可以轻松修改以从某些输入文件中读取并打印到另一个文件
如果修改为阅读＆amp;写入文件，将使用最小内存，因为所有内容都不存储在列表中。即懒惰地工作。

代码

def as_pairs(iterable):
    """
    yields two items at a time from iterable
    """
    iterator = iter(iterable)
    try:
        current_item = next(iterator)
        while True:
            next_item = next(iterator)
            yield current_item, next_item
            current_item = next_item
    except StopIteration:
        return


def merge_pairs(pair_words, word_list):
    """
    If the pair words are part of the word_list, merges them to one
    """
    pair_map = { tuple(pair_word.split(" ")) : pair_word for pair_word in pair_words }
    for pair in as_pairs(word_list):
        if pair in pair_map:
            yield pair_map.get(pair)
        else:
            first, second = pair
            yield first

def main():
    mycookbook= [
            ["i", "love", "tim", "tam", "and", "ice", "cream"], 
            ["cooking", "fresh", "vegetables", "is", "easy"], 
            ["fresh", "vegetables", "are", "good", "for", "health"]
            ]

    mylist = ["tim tam", "ice cream", "fresh vegetables"]
    return [ list(merge_pairs(mylist, sentence)) for sentence in mycookbook ]

print(main())

输出：

[[＆＃39; i＆＃39;，＆＃39; love＆＃39;，＆＃39; tim tam＆＃39;，＆＃39; tam＆＃39;，＆＃39;和＆＃39; ，＆＃39;冰淇淋＆＃39;]，[＆＃39;烹饪＆＃39;，＆＃39;新鲜蔬菜＆＃39;蔬菜＆＃39;，＆＃39;]，＆＃39;新鲜蔬菜＆＃39;，＆＃39;蔬菜＆＃39;，＆＃39;，＆＃39; good＆＃39;，＆＃39; for＆＃39;]]

Answer 7

这是一个解决方案。如果你关心性能，应该以某种方式索引mylist，这样匹配函数可以比顺序查找更好。

奖励：mylist中的条目可以包含任意数量的单词，而不仅仅是两个单词，通知添加“对健康有益”。

mycookbook= [["i", "love", "tim", "tam", "and", "ice", "cream"], ["cooking",
"fresh", "vegetables", "is", "easy"], ["fresh", "vegetables", "are", "good",
"for", "health"]]

mylist = ["tim tam", "ice cream", "fresh vegetables", "good for health"]

def transform(x):
    def match(i):
        for e in mylist:
            el = e.split()
            if x[i:i+len(el)] == el:
                return e, len(el)
        return x[i], 1
    i = 0
    while i < len(x):
        e, l = match(i)
        yield e
        i += l
answer = [list(transform(x)) for x in mycookbook]
print(answer)
'''
[['i', 'love', 'tim tam', 'and', 'ice cream'],
 ['cooking', 'fresh vegetables', 'is', 'easy'],
 ['fresh vegetables', 'are', 'good for health']]
'''

搜索两个连续的单词并将它们组合在python中

7 个答案:

注意

代码

输出：