Python从pandas数据帧中删除自定义的停用词

时间:2016-08-09 10:26:51

标签: python pandas dataframe

我正在关注下一个问题:Python remove stop words from pandas dataframe

但是对于我自定义的停用词列表不起作用,请查看此代码:

 pos_tweets = [('I love this car', 'positive'),
('This view is amazing', 'positive'),
('I feel great this morning', 'positive'),
('I am so excited about the concert', 'positive'),
('He is my best friend', 'positive')] 

import pandas as pd
test = pd.DataFrame(pos_tweets)


test.columns = ["tweet","col2"]
test["tweet"] = test["tweet"].str.lower().str.split()

stop =  ['love','car','amazing']

test['tweet'].apply(lambda x: [item for item in x if item not in stop)

print test

结果是:

                                   tweet      col2
0                       [i, love, this, car]  positive
1                  [this, view, is, amazing]  positive
2            [i, feel, great, this, morning]  positive
3  [i, am, so, excited, about, the, concert]  positive
4                 [he, is, my, best, friend]  positive
爱,车和惊人的话还在那里,我想念的是什么?

谢谢!

1 个答案:

答案 0 :(得分:1)

您需要将输出分配回列tweet

test['tweet'] = test['tweet'].apply(lambda x: [item for item in x if item not in stop])

print (test)
                                       tweet      col2
0                                  [i, this]  positive
1                           [this, view, is]  positive
2            [i, feel, great, this, morning]  positive
3  [i, am, so, excited, about, the, concert]  positive
4                 [he, is, my, best, friend]  positive