我有一个列表“ abc”(字符串),我正在尝试从列表“ abc”中删除列表“ stop”中存在的某些单词以及abc中存在的所有数字。
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234']
stop=['attention', 'installed']
我正在使用列表推导将其删除,但是下面的这段代码无法删除该单词。
new_word=[word for word in abc if word not in stop ]
结果:(注意词仍然存在。)
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'attention pleased',
'compliance installed']
所需的输出:
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance']
谢谢
答案 0 :(得分:5)
在过滤掉stop
中的单词后,您需要将每个短语分解为单词,然后将单词重新组合为短语。
[' '.join(w for w in p.split() if w not in stop) for p in abc]
这将输出:
['issues in performance', 'how are you doing', 'hey my name is abc, what is your name', 'pleased', 'compliance installed']
答案 1 :(得分:1)
只需使用set
就可以解决这个问题。因为每个项目可能有多个单词,所以不能使用in
。您应该将set
与&
一起使用以获取公共词。如果您的stop
集已存在,则将返回True
。因为您只关心其余部分,所以我们可以在这里使用if not
。
new_word=[word for word in abc if not set(word.split(' ')) & set(stop)]
更新
如果您还想删除所有包含数字的项目,只需执行以下操作即可:
new_word=[word for word in abc if not (set(word.split(' ')) & set(stop) or any([i.strip().isdigit() for i in word.split(' ')]))]
答案 2 :(得分:1)
这是一个解决方案,将简单的正则表达式与re.sub
方法一起使用。 此解决方案还删除了数字。
import re
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234']
stop=['attention\s+', 'installed\s+', '[0-9]']
[(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc]
'Output':
['issues in performance ',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance ']
答案 3 :(得分:1)
list1 = []
for word in abc:
word1 = ''
for remove_word in stop:
word1 = remove_word
word1 = word.replace(word1, '')
list1.append(word1)
答案 4 :(得分:1)
这至少是我要做的:
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234'
]
stop=['attention', 'installed']
for x, elem in enumerate(abc):
abc[x] = " ".join(filter(lambda x: x not in stop and not x.isdigit(), elem.split()))
print(abc)
结果:
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance']
希望对您有帮助:)