我有一个大字符串和一个停止词的大列表。我在下面创建了一个小例子。
s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
正如你可以想象的那样,我希望成员们能够摆脱困境。 我试过这个。
for word in stop:
s = s.replace(word,"")
我收到此错误。
AttributeError:'list'对象没有属性'replace'
答案 0 :(得分:0)
您需要执行以下操作。按s
将拆分为单词列表。然后从停用词列表中创建一个哈希值。然后遍历列表,如果值不是哈希值 - 请保留它。
s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
arr = s.split(' ')
h = {i: 1 for i in stop}
result = []
for i in arr:
if i not in h:
result.append(i)
print ' '.join(result)
答案 1 :(得分:0)
是一个列表,所以你可能对s进行了更改,现在它是一个列表而不是一个字符串
此代码效果很好:
s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
for word in stop:
s = s.replace(word,"")
尝试找到修改s的位置,在代码中的某处搜索任务
答案 2 :(得分:0)
最优雅的方式是使用set difference。
z = list(set(string.split()) - set(stop))
这将打印以下内容:
['United', '20', 'I', 'live', 'years', 'States', 'America.', 'York', 'New', 'old.']
单元测试
import unittest
def so_26944574(string):
stop = ["am", "old", "in", "of"]
z = list(set(string.split()) - set(stop))
return sorted(z)
# Unit Test
class Test(unittest.TestCase):
def testcase(self):
self.assertEqual(so_26944574("I am 20 years old. I live in New York in United States of America."), sorted(['United', '20', 'I', 'live', 'years', 'States', 'America.', 'York', 'New', 'old.']))
self.assertEqual(so_26944574("I am very old but still strong, kind of"), sorted(['I', 'very', 'but', 'still', 'strong,', 'kind']))
unittest.main()
测试通过
Ran 1 test in 0.000s
OK
答案 3 :(得分:0)
另一种方法是:
s = "I am 20 years old. I live in New York in United States of America."
stop = ["am", "old", "in", "of"]
s_list = s.split() # turn string into list
s = ' '.join([word for word in s_list if word not in stop]) # Make new string
>>> s
'I 20 years old. I live New York United States America.'