假设我有一个像这样的Pandas DataFrame:
sentences
['this', 'is', 'a', 'sentence', 'and', 'this', 'one', 'as', 'well']
['this', 'is', 'another', 'sentence', 'and', 'this', 'sentence', 'looks', 'like', 'other', 'sentences']
我正在尝试计算每一行中每个单词的数量,并以一种我可以在需要时轻松使用它的方式存储它们。到目前为止,我失败了,我会感激一些帮助。
谢谢!
答案 0 :(得分:0)
您可以将Counter
与DataFrame
构造函数一起使用,但对于缺失值,请NaNs
:
from collections import Counter
print (type(df.loc[0, 'sentences']))
<class 'list'>
df1 = pd.DataFrame([Counter(x) for x in df['sentences']])
print (df1)
a and another as is like looks one other sentence sentences \
0 1.0 1 NaN 1.0 1 NaN NaN 1.0 NaN 1 NaN
1 NaN 1 1.0 NaN 1 1.0 1.0 NaN 1.0 2 1.0
this well
0 2 1.0
1 2 NaN
如果需要将NaNs
替换为0
添加DataFrame.fillna
:
df1 = pd.DataFrame([Counter(x) for x in df['sentences']]).fillna(0).astype(int)
print (df1)
a and another as is like looks one other sentence sentences \
0 1 1 0 1 1 0 0 1 0 1 0
1 0 1 1 0 1 1 1 0 1 2 1
this well
0 2 1
1 2 0