任何想法为什么停止删除单词不能正常工作?它错误地替换了东西,有时会替换为a或者不能将it's
视为一个单词。
stop_words=open("stopwords.txt")
stop_words=stop_words.read().split("\n")
print stop_words
for line in splitted_tweets:
#print line
#print "***************************************"
if (line.__contains__("text='")):
start_index=line.index("text='")+6
end_index=line.index("',", start_index)
tweet=line[start_index:end_index]
print tweet
print "**********"
tweet_words = re.sub("[^\w]", " " , tweet).split()
print tweet_words
for word in stop_words:
if word in tweet_words:
print word
tweet=tweet.replace(word, "")
print "?????????????????????????"
print tweet
这里有一些示例输出:
['RT', 'sayingsforgirls', 'Do', 'not', 'touch', 'MY', 'iPhone', 'It', 's', 'not', 'an', 'usPhone', 'it', 's', 'not', 'a', 'wePhone', 'it', 's', 'not', 'an', 'ourPhone', 'it', 's', 'an', 'iPhone']
a
an
it
not
?????????????????????????
RT @syingsforgirls: Do touch MY iPhone. It's n usPhone, 's wePhone, 's n ourPhone, 's n iPhone.
Do not touch MY iPhone. It's not an usPhone, it's not a wePhone, it's not an ourPhone, it's an iPhone.
**********
['Do', 'not', 'touch', 'MY', 'iPhone', 'It', 's', 'not', 'an', 'usPhone', 'it', 's', 'not', 'a', 'wePhone', 'it', 's', 'not', 'an', 'ourPhone', 'it', 's', 'an', 'iPhone']
a
an
it
not
?????????????????????????
Do touch MY iPhone. It's n usPhone, 's wePhone, 's n ourPhone, 's n iPhone.
RT @BrianaaSymonee: she says imma dog, but it takes one to know one...
**********
['RT', 'BrianaaSymonee', 'she', 'says', 'imma', 'dog', 'but', 'it', 'takes', 'one', 'to', 'know', 'one']
but
it
she
to
?????????????????????????
RT @BrianaaSymonee: says imma dog, takes one know one...
she says imma dog, but it takes one to know one...
**********