我有一个数据框,第0列中有一些单词:
stopwords
0
1 a
2 ab
...
10 der
如何从str.lower().str.split(expand=True).stack.value_counters()
获得的系列中删除它:
Wordcount
die 293107
der 281475
...
以便移除所有词(停用词完全匹配):
Wordcount
die 2931707
....
答案 0 :(得分:1)
将停用词列转换为索引列,然后将Index.isin
与boolean indexing
结合使用:
stopwords = stopwords.set_index(0)
#no match
s3 = Wordcount[~Wordcount.index.isin(stopwords.index)]
#match
s4 = Wordcount[Wordcount.index.isin(stopwords.index)]
或将列传递给isin
函数:
#no match
s3 = Wordcount[~Wordcount.index.isin(stopwords[0])]
#match
s4 = Wordcount[Wordcount.index.isin(stopwords[0])]