如果某些行的列值缺失,如何应用TextBlob?

时间:2018-08-16 19:13:11

标签: python python-3.x pandas textblob

我有一个看起来像这样的数据框:

     Text
0    this is amazing
1    nan
2    wow you are great

我想将数据框单元格中的每个单词迭代到textblob中,以在新列中获得极性。但是,许多行中都有nan

我认为这导致TextBlob对所有行(即使其中包含文本)在新列中实现极性得分为0.0。

如何对列中的每个文本运行TextBlob.sentiment.polarity并创建一个带有极性分数的新列?

新df应该如下所示:

     Text                 sentiment
0    this is amazing      0.9
1    nan                  0.0
2    wow you are great    0.8

我不在乎nan,所以情感值可以是nan或0。

当前无效的代码:

for text in df.columns:
    a = TextBlob(text)
    df['sentiment']=a.sentiment.polarity
    print(df.value)

先谢谢您。

编辑:

要添加(不确定是否有区别),不会重置df上的索引,原因是df的其他部分按相同的索引号分组在一起。

3 个答案:

答案 0 :(得分:1)

尝试一下:

>>> s=pd.Series(['this is amazing',np.NaN,'wow you are great'],name='Text')
>>> s
Out[100]: 
0      this is amazing
1                  NaN
2    wow you are great
Name: Text, dtype: object

>>> s.apply(lambda x: np.NaN if pd.isnull(x) else TextBlob(x).sentiment.polarity)
Out[101]: 
0    0.60
1     NaN
2    0.45
Name: Text, dtype: float64

答案 1 :(得分:1)

另一种解决方案:

d = {'text': ['text1', 'text2', 'text3', 'text4', 'text5'], 'desc': ['The weather is nice today in my city.', 'I hate this weather.', 'Nice weather today.', 'Perfect weather today.', np.NaN]}
df = pd.DataFrame(data=d)
print(df)

    text                                   desc
0  text1  The weather is nice today in my city.
1  text2                   I hate this weather.
2  text3                    Nice weather today.
3  text4                 Perfect weather today.
4  text5                                    NaN

将情绪分析与TextBlob一起应用,并将结果添加到新列中:

df['sentiment'] = df['desc'].apply(lambda x: 'NaN' if pd.isnull(x) else TextBlob(x).sentiment.polarity)
print(df)

    text                                   desc sentiment
0  text1  The weather is nice today in my city.       0.6
1  text2                   I hate this weather.      -0.8
2  text3                    Nice weather today.       0.6
3  text4                 Perfect weather today.         1
4  text5                                    NaN       NaN

答案 2 :(得分:0)

如果您对nan有疑问,可以apply将函数插入nan列中没有Text的行,例如:

mask = df['Text'].notnull() #select the rows without nan
df.loc[mask,'sentiment'] = df.loc[mask,'Text'].apply(lambda x: TextBlob(x).sentiment.polarity)

注意:我没有TextBlob,所以我从您的代码中假设TextBlob(x).sentiment.polarity可以工作。