停止删除单词无法正常工作

时间:2015-10-28 16:46:56

标签: python regex string split stop-words

任何想法为什么停止删除单词不能正常工作?它错误地替换了东西,有时会替换为a或者不能将it's视为一个单词。

stop_words=open("stopwords.txt")
stop_words=stop_words.read().split("\n")
print stop_words
for line in splitted_tweets:
    #print line
    #print "***************************************"
    if (line.__contains__("text='")):
        start_index=line.index("text='")+6
        end_index=line.index("',", start_index)
        tweet=line[start_index:end_index]
        print tweet
        print "**********"
        tweet_words = re.sub("[^\w]", " " , tweet).split()
        print tweet_words
        for word in stop_words:
                if word in tweet_words:
                        print word
                        tweet=tweet.replace(word, "")

        print "?????????????????????????"
        print tweet

这里有一些示例输出:

['RT', 'sayingsforgirls', 'Do', 'not', 'touch', 'MY', 'iPhone', 'It', 's', 'not', 'an', 'usPhone', 'it', 's', 'not', 'a', 'wePhone', 'it', 's', 'not', 'an', 'ourPhone', 'it', 's', 'an', 'iPhone']
a
an
it
not
?????????????????????????
RT @syingsforgirls: Do  touch MY iPhone. It's  n usPhone, 's   wePhone, 's  n ourPhone, 's n iPhone.
Do not touch MY iPhone. It's not an usPhone, it's not a wePhone, it's not an ourPhone, it's an iPhone.
**********
['Do', 'not', 'touch', 'MY', 'iPhone', 'It', 's', 'not', 'an', 'usPhone', 'it', 's', 'not', 'a', 'wePhone', 'it', 's', 'not', 'an', 'ourPhone', 'it', 's', 'an', 'iPhone']
a
an
it
not
?????????????????????????
Do  touch MY iPhone. It's  n usPhone, 's   wePhone, 's  n ourPhone, 's n iPhone.
RT @BrianaaSymonee: she says imma dog, but it takes one to know one...
**********
['RT', 'BrianaaSymonee', 'she', 'says', 'imma', 'dog', 'but', 'it', 'takes', 'one', 'to', 'know', 'one']
but
it
she
to
?????????????????????????
RT @BrianaaSymonee:  says imma dog,   takes one  know one...
she says imma dog, but it takes one to know one...
**********

0 个答案:

没有答案