从数据框中删除停用词

时间:2017-04-13 11:49:09

标签: python pandas

dataframe['Text'] = dataframe['Text'].apply(lambda x : ' '.join([item for item in string.split(x.lower()) if item not in stopwords]))

我正从数据框中删除停用词。逻辑工作正常,但当有一些空行时,它会出错。

我使用过dropna()但它会丢弃整行而不是其他列中的数据。

如何在上面的逻辑中添加条件,即Text Text not null null

2 个答案:

答案 0 :(得分:1)

在你的逻辑之前使用

dataframe.dropna(subset=['Text'], how='all')

答案 1 :(得分:1)

您可以将NaN替换为空list但不容易 - 使用空Series创建的lists maskcombine_first

pos_tweets = [('I love this car', 'positive'),
('This view is amazing', 'positive'),
('I feel great this morning', 'positive'),
('I am so excited about the concert', 'positive'),
(None, 'positive')] 

df = pd.DataFrame(pos_tweets, columns= ["Text","col2"])
print (df)
                                Text      col2
0                    I love this car  positive
1               This view is amazing  positive
2          I feel great this morning  positive
3  I am so excited about the concert  positive
4                               None  positive

stopwords =  ['love','car','amazing']
s = pd.Series([[]], index=df.index)
df["Text"] = df["Text"].str.lower().str.split().mask(df["Text"].isnull(), s)
print (df)
                                        Text      col2
0                       [i, love, this, car]  positive
1                  [this, view, is, amazing]  positive
2            [i, feel, great, this, morning]  positive
3  [i, am, so, excited, about, the, concert]  positive
4                                         []  positive

df['Text']=df['Text'].apply(lambda x:' '.join([item for item in x if item not in stopwords]))
print (df)
                                Text      col2
0                             i this  positive
1                       this view is  positive
2          i feel great this morning  positive
3  i am so excited about the concert  positive
4                                     positive

另一种解决方案:

stopwords =  ['love','car','amazing']
df["Text"]=df["Text"].str.lower().str.split().combine_first(pd.Series([[]], index=df.index))
print (df)
                                        Text      col2
0                       [i, love, this, car]  positive
1                  [this, view, is, amazing]  positive
2            [i, feel, great, this, morning]  positive
3  [i, am, so, excited, about, the, concert]  positive
4                                         []  positive