例如,我得到如下数据框:
PassengerId Survived Pclass
0 1 0 3
1 2 1 1
2 3 1 3
在调用df.value_counts()
之后,我可以得到所有列的value_counts()
,而不必每次都指定一列,这可能是这样的:
1 1
2 1
3 1
Name: PassengerId, dtype: int64
0 1
1 2
Name: Survived, dtype: int64
3 2
1 1
Name: Survived, dtype: int64
我想知道如何实现。
有人可以帮我吗?
预先感谢。
答案 0 :(得分:3)
对于每列应用功能,有DataFrame.apply
的2种解决方案,但是索引按它们的交点对齐,因此添加了NaN
s:
df1 = df.apply(pd.value_counts)
print (df1)
PassengerId Survived Pclass
0 NaN 1.0 NaN
1 1.0 2.0 1.0
2 1.0 NaN NaN
3 1.0 NaN 2.0
df1 = df.apply(pd.Series.value_counts)
print (df1)
PassengerId Survived Pclass
0 NaN 1.0 NaN
1 1.0 2.0 1.0
2 1.0 NaN NaN
3 1.0 NaN 2.0
为避免这种情况,可以使用SeriesGroupBy.value_counts
:
df1 = df.stack().groupby(level=1).value_counts().rename_axis(('a','b')).reset_index(name='c')
print (df1)
a b c
0 PassengerId 1 1
1 PassengerId 2 1
2 PassengerId 3 1
3 Pclass 3 2
4 Pclass 1 1
5 Survived 1 2
6 Survived 0 1
或带有DataFrame.stack
的原始解决方案:
df1 = (df.apply(pd.Series.value_counts)
.stack()
.astype(int)
.rename_axis(('a','b'))
.reset_index(name='c')
print (df1)
a b c
0 0 Survived 1
1 1 PassengerId 1
2 1 Survived 2
3 1 Pclass 1
4 2 PassengerId 1
5 3 PassengerId 1
6 3 Pclass 2
答案 1 :(得分:2)
另一种替代方法是使用melt
df.reset_index().melt('index').groupby('index').value.value_counts()
Out[608]:
index value
0 0 1
1 1
3 1
1 1 2
2 1
2 3 2
1 1
Name: value, dtype: int64
答案 2 :(得分:1)
您可以尝试以下代码:
d={'PassengerId':pd.Series([1,2,3]),
'Survived':pd.Series([0,1,1]),
'Pclass':pd.Series([3,1,3])}
df=pd.DataFrame(d)
print(df)
s=[]
for i in range(df.shape[0]):
s.append(pd.Series(df.apply(pd.value_counts).values[:,i]).dropna())
print('\nvalue counts each column:')
print(s)
输出:
PassengerId Survived Pclass
0 1 0 3
1 2 1 1
2 3 1 3
value counts each column:
[1 1.0
2 1.0
3 1.0
dtype: float64, 0 1.0
1 2.0
dtype: float64, 1 1.0
3 2.0
dtype: float64]