Question

我需要帮助来遍历句子/字符串列表，并根据另一个带有单词的列表擦除向前转发的字符串字符。

sentences = ['im not george smith my name is lucas mangulu thank you',
             'how shall i call you george smith oh okay got it'
             'we have detected a miyagi chung in the traffic flow']

words = ['lucas mangulu', 'george smith', 'miyagi chung']

我知道我必须为句子列表中的每个元素循环。但是随后，我陷入了如何在例如 words 列表中 words 列表中的同一元素中进行find（）的问题。这样最终结果应该是：

sentences = ['im not george smith my name is',
             'how shall i call you'
             'we have detected a']

#OR

sentences = ['im not george smith my name is lucas mangulu',
             'how shall i call you george smith'
             'we have detected a miyagi chung']

Answer 1

我很难理解您要查找的内容，但这是从words的字符串中删除sentences的字符串的简单想法；这会使用对str.replace()的许多调用。

>>> words = ['lucas mangulu', 'george smith', 'miyagi chung']
>>> original_sentences = [
...     'im not george smith my name is lucas mangulu thank you',
...     'how shall i call you george smith oh okay got it',
...     'we have detected a miyagi chung in the traffic flow',
... ]
>>> original_sentences
['im not george smith my name is lucas mangulu thank you',
 'how shall i call you george smith oh okay got it',
 'we have detected a miyagi chung in the traffic flow']

>>> sentences = list(original_sentences)                  # make a copy
>>> for i in range(len(sentences)):
...     for w in words:                                   # remove words
...         sentences[i] = sentences[i].replace(w, '')
...     while '  ' in sentences[i]:                       # remove double whitespaces
...         sentences[i] = sentences[i].replace('  ', ' ')
>>> sentences
['im not my name is thank you',
 'how shall i call you oh okay got it',
 'we have detected a in the traffic flow']

这是您打算做什么？

如果您只想在所有句子中替换一个单词，则可以删除嵌套的for循环：

>>> sentences = list(original_sentences)                  # make a copy
>>> word_to_remove = words[0]                             # pick one
>>> for i in range(len(sentences)):
...     sentences[i] = sentences[i].replace(word_to_remove, '')
>>> sentences
['im not george smith my name is  thank you',
 'how shall i call you george smith oh okay got it',
 'we have detected a miyagi chung in the traffic flow']

Answer 2

您为一个输入给出了两个示例输出，这非常令人困惑。以下代码可能会对您有所帮助，但我无法从逻辑上弄清楚如何与您的示例完全匹配。

话虽这么说，我有一种预感，这就是您想要的。

import re
sentences = ['im not george smith my name is lucas mangulu thank you',
             'how shall i call you george smith oh okay got it',
             'we have detected a miyagi chung in the traffic flow',
             'Is this valid?']

words = ['lucas mangulu', 'george smith', 'miyagi chung', 'test']
ocurrences = []
for sentence in sentences:
    # If you want to find all occurences in a sentence this line will help you
    # ocurrences.append([(x.start(), x.end(), x.group()) for x in re.finditer('|'.join(words), sentence)])

    # Look for a word in this sentence (the first occurrence of that word)
    search_result = re.search('|'.join(words), sentence)
    # If we found a word in this sentence
    if search_result:
        ocurrences.append((search_result.start(), search_result.end(), search_result.group()))
    else: # No word found
        ocurrences.append((0, 0, None))

# Example output 1:
# oc in this case is (start_index, end_index, word_found) for each sentence.
for index, oc in enumerate(ocurrences):
  print(sentences[index][:oc[1]])

# Example output 2"
for index, oc in enumerate(ocurrences):
  print(sentences[index][:oc[0]])

示例输出1：

我不是乔治·史密斯
  我怎么称呼你乔治·史密斯
  我们检测到宫城忠

示例输出2：

我不是
  我该怎么称呼你
  我们检测到一个

从字符串列表中查找，从字符串列表中查找

2 个答案: