我写了这个函数,用于使用nltk.bigrams从字符串生成bigrams并忽略停用词和字母但停止词和字母仍然出现在输出中。请帮我纠正这个问题。
def bigramReturner (tweetString, stopWords):
bigramFeatureVector = []
tweetStringG = tweetString.lower()
tweetStringG = tweetString.split()
for i in tweetStringG:
i =replaceTwoOrMore(i)
i =i.strip('\'"?,.')
val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", i)
if(i in stopWords is None):
continue
else:
for i in nltk.bigrams(tweetStringG):
bigramFeatureVector.append(' '.join(i))
return bigramFeatureVector
答案 0 :(得分:0)
尝试删除is None
检查,因为您目前正在将True
或False
与None
进行比较
def bigramReturner (tweetString, stopWords):
bigramFeatureVector = []
tweetStringG = tweetString.lower()
tweetStringG = tweetString.split()
for i in tweetStringG:
i =replaceTwoOrMore(i)
i =i.strip('\'"?,.')
val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", i)
if(i in stopWords):
continue
else:
for i in nltk.bigrams(tweetStringG):
bigramFeatureVector.append(' '.join(i))
return bigramFeatureVector