如何制作没有停止词的双字母

时间:2017-02-13 15:30:50

标签: python nltk sentiment-analysis

我写了这个函数,用于使用nltk.bigrams从字符串生成bigrams并忽略停用词和字母但停止词和字母仍然出现在输出中。请帮我纠正这个问题。

       def bigramReturner (tweetString, stopWords):
           bigramFeatureVector = []
           tweetStringG = tweetString.lower()
           tweetStringG = tweetString.split()
           for i in tweetStringG:
               i =replaceTwoOrMore(i)
               i =i.strip('\'"?,.')
               val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", i)
               if(i in stopWords  is None):
                   continue
               else:
                  for i in nltk.bigrams(tweetStringG):
                        bigramFeatureVector.append(' '.join(i))
           return bigramFeatureVector

1 个答案:

答案 0 :(得分:0)

尝试删除is None检查,因为您目前正在将TrueFalseNone进行比较

   def bigramReturner (tweetString, stopWords):
       bigramFeatureVector = []
       tweetStringG = tweetString.lower()
       tweetStringG = tweetString.split()
       for i in tweetStringG:
           i =replaceTwoOrMore(i)
           i =i.strip('\'"?,.')
           val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", i)
           if(i in stopWords):
               continue
           else:
              for i in nltk.bigrams(tweetStringG):
                    bigramFeatureVector.append(' '.join(i))
       return bigramFeatureVector