从一系列数据框列中删除单词

时间:2019-12-02 07:36:20

标签: python regex pandas

我有一个数据框,第0列中有一些单词:

stopwords

    0
1   a
2   ab
...
10  der

如何从str.lower().str.split(expand=True).stack.value_counters()获得的系列中删除它:

Wordcount
die    293107
der    281475
...

以便移除所有词(停用词完全匹配):

Wordcount
die 2931707
....

1 个答案:

答案 0 :(得分:1)

将停用词列转换为索引列,然后将Index.isinboolean indexing结合使用:

stopwords = stopwords.set_index(0)
#no match
s3 = Wordcount[~Wordcount.index.isin(stopwords.index)]

#match
s4 = Wordcount[Wordcount.index.isin(stopwords.index)]

或将列传递给isin函数:

#no match
s3 = Wordcount[~Wordcount.index.isin(stopwords[0])]

#match
s4 = Wordcount[Wordcount.index.isin(stopwords[0])]