Question

我有一个大字符串和一个停止词的大列表。我在下面创建了一个小例子。

s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]

正如你可以想象的那样，我希望成员们能够摆脱困境。我试过这个。

for word in stop:
    s = s.replace(word,"")

我收到此错误。

AttributeError：'list'对象没有属性'replace'

Answer 1

您需要执行以下操作。按s将拆分为单词列表。然后从停用词列表中创建一个哈希值。然后遍历列表，如果值不是哈希值 - 请保留它。

s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
arr = s.split(' ')
h = {i: 1 for i in stop}

result = []
for i in arr:
    if i not in h:
        result.append(i)

print ' '.join(result)

Answer 2

当你编写s.replace（）时，

是一个列表，所以你可能对s进行了更改，现在它是一个列表而不是一个字符串

此代码效果很好：

s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
for word in stop:
    s = s.replace(word,"")

尝试找到修改s的位置，在代码中的某处搜索任务

Answer 3

演示here

最优雅的方式是使用set difference。

z = list(set(string.split()) - set(stop))

这将打印以下内容：

['United', '20', 'I', 'live', 'years', 'States', 'America.', 'York', 'New', 'old.']

单元测试

import unittest

def so_26944574(string):
    stop = ["am", "old", "in", "of"]
    z = list(set(string.split()) - set(stop))
    return sorted(z)

# Unit Test
class Test(unittest.TestCase):
    def testcase(self):
        self.assertEqual(so_26944574("I am 20 years old. I live in New York in United States of America."), sorted(['United', '20', 'I', 'live', 'years', 'States', 'America.', 'York', 'New', 'old.']))
        self.assertEqual(so_26944574("I am very old but still strong, kind of"), sorted(['I', 'very', 'but', 'still', 'strong,', 'kind']))
unittest.main()

测试通过

Ran 1 test in 0.000s

OK

Answer 4

另一种方法是：

s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
s_list = s.split() # turn string into list
s = ' '.join([word for word in s_list if word not in stop]) # Make new string
>>> s
'I 20 years old. I live New York United States America.'

Python从字符串中删除列表中的字符串

4 个答案:

演示here