熊猫:如何计算DataFrame中每行中的单个单词

时间:2017-06-19 08:32:46

标签: python pandas

假设我有一个像这样的Pandas DataFrame:

sentences
['this', 'is', 'a', 'sentence', 'and', 'this', 'one', 'as', 'well']
['this', 'is', 'another', 'sentence', 'and', 'this', 'sentence', 'looks', 'like', 'other', 'sentences']

我正在尝试计算每一行中每个单词的数量,并以一种我可以在需要时轻松使用它的方式存储它们。到目前为止,我失败了,我会感激一些帮助。

谢谢!

1 个答案:

答案 0 :(得分:0)

您可以将CounterDataFrame构造函数一起使用,但对于缺失值,请NaNs

from collections import Counter

print (type(df.loc[0, 'sentences']))
<class 'list'>

df1 = pd.DataFrame([Counter(x) for x in df['sentences']])
print (df1)
     a  and  another   as  is  like  looks  one  other  sentence  sentences  \
0  1.0    1      NaN  1.0   1   NaN    NaN  1.0    NaN         1        NaN   
1  NaN    1      1.0  NaN   1   1.0    1.0  NaN    1.0         2        1.0   

   this  well  
0     2   1.0  
1     2   NaN  

如果需要将NaNs替换为0添加DataFrame.fillna

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]).fillna(0).astype(int)
print (df1)
   a  and  another  as  is  like  looks  one  other  sentence  sentences  \
0  1    1        0   1   1     0      0    1      0         1          0   
1  0    1        1   0   1     1      1    0      1         2          1   

   this  well  
0     2     1  
1     2     0