我需要帮助来遍历句子/字符串列表,并根据另一个带有单词的列表擦除向前转发的字符串字符。
sentences = ['im not george smith my name is lucas mangulu thank you',
'how shall i call you george smith oh okay got it'
'we have detected a miyagi chung in the traffic flow']
words = ['lucas mangulu', 'george smith', 'miyagi chung']
我知道我必须为句子列表中的每个元素循环。但是随后,我陷入了如何在例如 words 列表中 words 列表中的同一元素中进行find()的问题。这样最终结果应该是:
sentences = ['im not george smith my name is',
'how shall i call you'
'we have detected a']
#OR
sentences = ['im not george smith my name is lucas mangulu',
'how shall i call you george smith'
'we have detected a miyagi chung']
答案 0 :(得分:1)
我很难理解您要查找的内容,但这是从words
的字符串中删除sentences
的字符串的简单想法;这会使用对str.replace()
的许多调用。
>>> words = ['lucas mangulu', 'george smith', 'miyagi chung']
>>> original_sentences = [
... 'im not george smith my name is lucas mangulu thank you',
... 'how shall i call you george smith oh okay got it',
... 'we have detected a miyagi chung in the traffic flow',
... ]
>>> original_sentences
['im not george smith my name is lucas mangulu thank you',
'how shall i call you george smith oh okay got it',
'we have detected a miyagi chung in the traffic flow']
>>> sentences = list(original_sentences) # make a copy
>>> for i in range(len(sentences)):
... for w in words: # remove words
... sentences[i] = sentences[i].replace(w, '')
... while ' ' in sentences[i]: # remove double whitespaces
... sentences[i] = sentences[i].replace(' ', ' ')
>>> sentences
['im not my name is thank you',
'how shall i call you oh okay got it',
'we have detected a in the traffic flow']
这是您打算做什么?
如果您只想在所有句子中替换一个单词,则可以删除嵌套的for循环:
>>> sentences = list(original_sentences) # make a copy
>>> word_to_remove = words[0] # pick one
>>> for i in range(len(sentences)):
... sentences[i] = sentences[i].replace(word_to_remove, '')
>>> sentences
['im not george smith my name is thank you',
'how shall i call you george smith oh okay got it',
'we have detected a miyagi chung in the traffic flow']
答案 1 :(得分:0)
您为一个输入给出了两个示例输出,这非常令人困惑。 以下代码可能会对您有所帮助,但我无法从逻辑上弄清楚如何与您的示例完全匹配。
话虽这么说,我有一种预感,这就是您想要的。
import re
sentences = ['im not george smith my name is lucas mangulu thank you',
'how shall i call you george smith oh okay got it',
'we have detected a miyagi chung in the traffic flow',
'Is this valid?']
words = ['lucas mangulu', 'george smith', 'miyagi chung', 'test']
ocurrences = []
for sentence in sentences:
# If you want to find all occurences in a sentence this line will help you
# ocurrences.append([(x.start(), x.end(), x.group()) for x in re.finditer('|'.join(words), sentence)])
# Look for a word in this sentence (the first occurrence of that word)
search_result = re.search('|'.join(words), sentence)
# If we found a word in this sentence
if search_result:
ocurrences.append((search_result.start(), search_result.end(), search_result.group()))
else: # No word found
ocurrences.append((0, 0, None))
# Example output 1:
# oc in this case is (start_index, end_index, word_found) for each sentence.
for index, oc in enumerate(ocurrences):
print(sentences[index][:oc[1]])
# Example output 2"
for index, oc in enumerate(ocurrences):
print(sentences[index][:oc[0]])
示例输出1:
我不是乔治·史密斯
我怎么称呼你乔治·史密斯
我们检测到宫城忠
示例输出2:
我不是
我该怎么称呼你
我们检测到一个