从字符串中删除单词列表

时间:2014-08-17 03:23:04

标签: python string

我有一个停用词列表。我有一个搜索字符串。我想从字符串中删除单词。

举个例子:

stopwords=['what','who','is','a','at','is','he']
query='What is hello'

现在代码应该删除'What'和'is'。但是在我的情况下,它会删除'a',以及'at'。我在下面给出了我的代码。我能做错什么?

for word in stopwords:
    if word in query:
        print word
        query=query.replace(word,"")

如果输入查询是“What is Hello”,我得到输出为:
wht s llo

为什么会这样?

6 个答案:

答案 0 :(得分:34)

这是一种方法:

query = 'What is hello'
stopwords = ['what','who','is','a','at','is','he']
querywords = query.split()

resultwords  = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)

print result

我注意到,如果单词的小写变体位于列表中,您也想要删除单词,因此我在条件检查中添加了对lower()的调用。

答案 1 :(得分:4)

看看你问题的其他答案,我注意到他们告诉你如何做你想做的事,但他们没有回答你最后提出的问题。

  

如果输入查询是"什么是Hello",我得到输出为:

     

wht s llo

     

为什么会这样?

这是因为.replace()会完全替换你给它的子字符串。

例如:

"My, my! Hello my friendly mystery".replace("my", "")

给出:

>>> "My, ! Hello  friendly stery"

.replace()实际上是将字符串拆分为作为第一个参数给出的子字符串,然后将其与第二个参数连接在一起。

"hello".replace("he", "je")

在逻辑上类似于:

"je".join("hello".split("he"))

如果您仍然想要使用.replace删除整个单词,您可能会认为在前后添加空格就足够了,但这会在字符串的开头和结尾留下单词以及子串的间断版本

"My, my! hello my friendly mystery".replace(" my ", " ")
>>> "My, my! hello friendly mystery"

"My, my! hello my friendly mystery".replace(" my", "")
>>> "My,! hello friendlystery"

"My, my! hello my friendly mystery".replace("my ", "")
>>> "My, my! hello friendly mystery"

此外,在之前和之后添加空格将不会捕获重复项,因为它已经处理了第一个子字符串并将忽略它以支持继续:

"hello my my friend".replace(" my ", " ")
>>> "hello my friend"

由于这些原因,your accepted answer Robby Cornelissen是建议您做所需的工作方式。

答案 2 :(得分:4)

当提供由空格分隔的单词列表时,所接受的答案有效,但在现实生活中,当可以使用标点符号来分隔单词时,情况并非如此。在这种情况下,re.split是必需的。

此外,将stopwords作为set进行测试可以更快地查找(即使在字符串散列和查找时还有少量字词之间进行权衡)

我的建议:

import re

query = 'What is hello? Says Who?'
stopwords = {'what','who','is','a','at','is','he'}

resultwords  = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)

输出:

hello Says 

答案 3 :(得分:2)

以karthikr所说的为基础,尝试

' '.join(filter(lambda x: x.lower() not in stopwords,  query.split()))

说明:

query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]

filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
                      # filters it based on the function which will take in one item at
                      # a time and return true.false

lambda x: x.lower() not in stopwords   # anonymous function that takes in variable,
                                       # converts it to lower case, and returns true if
                                       # the word is not in the iterable stopwords


' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
                   #using the string/char in front of the dot, i.e. ' ' as a joiner.
                   # i.e. ["What", "is","hello"] -> "What is hello"

答案 4 :(得分:0)

stopwords=['for','or','to']
p='Asking for help, clarification, or responding to other answers.'
for i in stopwords:
  n=p.replace(i,'')
  p=n
print(p)

答案 5 :(得分:-1)

" ".join([x for x in query.split() if x not in stopwords])