我有一个带有A列的数据框,我想将bin划分为bin,并将每个bin计数为dataframe的列,例如bin从0到多少点,并将其添加到dataframe中。
我使用此代码进行分箱,但是我不确定如何在df中插入count列。
df=pd.DataFrame({'max':[0.2,0.3,1,1.5,2.5,0.2]})
print(df)
max
0 0.2
1 0.3
2 1.0
3 1.5
4 2.5
5 0.2
bins = [0, 0.5, 1, 1.5, 2, 2.5]
x=pd.cut(df['max'], bins)
所需的输出
print(df)
0_0.5_count 0.5_1_count
0 3 1
答案 0 :(得分:1)
首先将参数label
添加到cut
,然后按Series.value_counts
进行计数,对于DataFrame,使用Series.to_frame
并按DataFrame.T
进行转置:
bins = [0, 0.5, 1, 1.5, 2, 2.5]
labels = ['{}_{}_count'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
x=pd.cut(df['max'], bins, labels=labels).value_counts().sort_index().to_frame(0).T
print (x)
0_0.5_count 0.5_1_count 1_1.5_count 1.5_2_count 2_2.5_count
0 3 1 1 0 1
详细信息:
print (pd.cut(df['max'], bins, labels=labels))
0 0_0.5_count
1 0_0.5_count
2 0.5_1_count
3 1_1.5_count
4 2_2.5_count
5 0_0.5_count
Name: max, dtype: category
Categories (5, object): [0_0.5_count < 0.5_1_count < 1_1.5_count < 1.5_2_count < 2_2.5_count]
print (pd.cut(df['max'], bins, labels=labels).value_counts())
0_0.5_count 3
2_2.5_count 1
1_1.5_count 1
0.5_1_count 1
1.5_2_count 0
Name: max, dtype: int64
使用GroupBy.size
的替代解决方案:
bins = [0, 0.5, 1, 1.5, 2, 2.5]
labels = ['{}_{}_count'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
x= df.groupby(pd.cut(df['max'], bins, labels=labels)).size().rename_axis(None).to_frame().T
print (x)
0_0.5_count 0.5_1_count 1_1.5_count 1.5_2_count 2_2.5_count
0 3 1 1 0 1