我是python编程的新手。我有两个列表,第一个列表包含停用词,而另一个列表包含文本文档。我想用“ /”替换文本文档中的停用词。有没有人可以帮助您?
我使用了替换功能,它给出了错误
text = "This is an example showing off word filtration"
stop = `set`(stopwords.words("english"))
text = nltk.word_tokenize(document)
`for` word in stop:
text = text.replace(stop, "/")
`print`(text)
它应该输出 “ / / /示例显示/单词过滤”
答案 0 :(得分:1)
>>> from nltk.corpus import stopwords
>>> from nltk.tokenize import word_tokenize
>>> stop_words = set(stopwords.words('english'))
>>> text = "This is an example showing off word filtration"
>>> text_tokens = word_tokenize(text)
>>> replaced_text_words = ["/" if word.lower() in stop_words else word for word in text_tokens]
>>> replaced_text_words
['/', '/', '/', 'example', 'showing', '/', 'word', 'filtration']
>>> replaced_sentence = " ".join(replaced_text_words)
>>> replaced_sentence
/ / / example showing / word filtration
答案 1 :(得分:1)
如何使用正则表达式模式?
您的代码将如下所示:
from nltk.corpus import stopwords
import nltk
text = "This is an example showing off word filtration"
text = text.lower()
import re
pattern = re.compile(r'\b(' + r'|'.join(stopwords.words('english')) + r')\b\s*')
text = pattern.sub('/ ', text)
与此post相关。
答案 2 :(得分:0)
您应该在替换函数中使用word
而不是stop
。
for word in stop:
text = text.replace(word, "/")
答案 3 :(得分:0)
您可以尝试
' '/join([item if item.lower() not in stop else "/" for item in text ])