Question

假设我有一个像这样的Pandas DataFrame：

sentences
['this', 'is', 'a', 'sentence', 'and', 'this', 'one', 'as', 'well']
['this', 'is', 'another', 'sentence', 'and', 'this', 'sentence', 'looks', 'like', 'other', 'sentences']

我正在尝试计算每一行中每个单词的数量，并以一种我可以在需要时轻松使用它的方式存储它们。到目前为止，我失败了，我会感激一些帮助。

谢谢！

Answer 1

您可以将Counter与DataFrame构造函数一起使用，但对于缺失值，请NaNs：

from collections import Counter

print (type(df.loc[0, 'sentences']))
<class 'list'>

df1 = pd.DataFrame([Counter(x) for x in df['sentences']])
print (df1)
     a  and  another   as  is  like  looks  one  other  sentence  sentences  \
0  1.0    1      NaN  1.0   1   NaN    NaN  1.0    NaN         1        NaN   
1  NaN    1      1.0  NaN   1   1.0    1.0  NaN    1.0         2        1.0   

   this  well  
0     2   1.0  
1     2   NaN

如果需要将NaNs替换为0添加DataFrame.fillna：

df1 = pd.DataFrame([Counter(x) for x in df['sentences']]).fillna(0).astype(int)
print (df1)
   a  and  another  as  is  like  looks  one  other  sentence  sentences  \
0  1    1        0   1   1     0      0    1      0         1          0   
1  0    1        1   0   1     1      1    0      1         2          1   

   this  well  
0     2     1  
1     2     0

熊猫：如何计算DataFrame中每行中的单个单词

1 个答案: