我有一个停用词列表。我有一个搜索字符串。我想从字符串中删除单词。
举个例子:
stopwords=['what','who','is','a','at','is','he']
query='What is hello'
现在代码应该删除'What'和'is'。但是在我的情况下,它会删除'a',以及'at'。我在下面给出了我的代码。我能做错什么?
for word in stopwords:
if word in query:
print word
query=query.replace(word,"")
如果输入查询是“What is Hello”,我得到输出为:
wht s llo
为什么会这样?
答案 0 :(得分:34)
这是一种方法:
query = 'What is hello'
stopwords = ['what','who','is','a','at','is','he']
querywords = query.split()
resultwords = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print result
我注意到,如果单词的小写变体位于列表中,您也想要删除单词,因此我在条件检查中添加了对lower()
的调用。
答案 1 :(得分:4)
看看你问题的其他答案,我注意到他们告诉你如何做你想做的事,但他们没有回答你最后提出的问题。
如果输入查询是"什么是Hello",我得到输出为:
wht s llo
为什么会这样?
这是因为.replace()会完全替换你给它的子字符串。
例如:
"My, my! Hello my friendly mystery".replace("my", "")
给出:
>>> "My, ! Hello friendly stery"
.replace()实际上是将字符串拆分为作为第一个参数给出的子字符串,然后将其与第二个参数连接在一起。
"hello".replace("he", "je")
在逻辑上类似于:
"je".join("hello".split("he"))
如果您仍然想要使用.replace删除整个单词,您可能会认为在前后添加空格就足够了,但这会在字符串的开头和结尾留下单词以及子串的间断版本
"My, my! hello my friendly mystery".replace(" my ", " ")
>>> "My, my! hello friendly mystery"
"My, my! hello my friendly mystery".replace(" my", "")
>>> "My,! hello friendlystery"
"My, my! hello my friendly mystery".replace("my ", "")
>>> "My, my! hello friendly mystery"
此外,在之前和之后添加空格将不会捕获重复项,因为它已经处理了第一个子字符串并将忽略它以支持继续:
"hello my my friend".replace(" my ", " ")
>>> "hello my friend"
由于这些原因,your accepted answer Robby Cornelissen是建议您做所需的工作方式。
答案 2 :(得分:4)
当提供由空格分隔的单词列表时,所接受的答案有效,但在现实生活中,当可以使用标点符号来分隔单词时,情况并非如此。在这种情况下,re.split
是必需的。
此外,将stopwords
作为set
进行测试可以更快地查找(即使在字符串散列和查找时还有少量字词之间进行权衡)
我的建议:
import re
query = 'What is hello? Says Who?'
stopwords = {'what','who','is','a','at','is','he'}
resultwords = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)
输出:
hello Says
答案 3 :(得分:2)
以karthikr所说的为基础,尝试
' '.join(filter(lambda x: x.lower() not in stopwords, query.split()))
说明:
query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]
filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
# filters it based on the function which will take in one item at
# a time and return true.false
lambda x: x.lower() not in stopwords # anonymous function that takes in variable,
# converts it to lower case, and returns true if
# the word is not in the iterable stopwords
' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
#using the string/char in front of the dot, i.e. ' ' as a joiner.
# i.e. ["What", "is","hello"] -> "What is hello"
答案 4 :(得分:0)
stopwords=['for','or','to']
p='Asking for help, clarification, or responding to other answers.'
for i in stopwords:
n=p.replace(i,'')
p=n
print(p)
答案 5 :(得分:-1)
" ".join([x for x in query.split() if x not in stopwords])