如何使每个数据箱都成为数据框的列

时间:2019-06-27 05:02:43

标签: python-3.x pandas numpy

我有一个带有A列的数据框,我想将bin划分为bin,并将每个bin计数为dataframe的列,例如bin从0到多少点,并将其添加到dataframe中。

我使用此代码进行分箱,但是我不确定如何在df中插入count列。

df=pd.DataFrame({'max':[0.2,0.3,1,1.5,2.5,0.2]})
print(df)
   max
0  0.2
1  0.3
2  1.0
3  1.5
4  2.5
5  0.2

    bins = [0, 0.5, 1, 1.5, 2, 2.5]

    x=pd.cut(df['max'], bins)

所需的输出

print(df)
   0_0.5_count  0.5_1_count
0            3            1

1 个答案:

答案 0 :(得分:1)

首先将参数label添加到cut,然后按Series.value_counts进行计数,对于DataFrame,使用Series.to_frame并按DataFrame.T进行转置:

bins = [0, 0.5, 1, 1.5, 2, 2.5]

labels = ['{}_{}_count'.format(i, j) for i, j in zip(bins[:-1], bins[1:])] 
x=pd.cut(df['max'], bins, labels=labels).value_counts().sort_index().to_frame(0).T
print (x)

   0_0.5_count  0.5_1_count  1_1.5_count  1.5_2_count  2_2.5_count
0            3            1            1            0            1

详细信息

print (pd.cut(df['max'], bins, labels=labels))
0    0_0.5_count
1    0_0.5_count
2    0.5_1_count
3    1_1.5_count
4    2_2.5_count
5    0_0.5_count
Name: max, dtype: category
Categories (5, object): [0_0.5_count < 0.5_1_count < 1_1.5_count < 1.5_2_count < 2_2.5_count]

print (pd.cut(df['max'], bins, labels=labels).value_counts())
0_0.5_count    3
2_2.5_count    1
1_1.5_count    1
0.5_1_count    1
1.5_2_count    0
Name: max, dtype: int64  

使用GroupBy.size的替代解决方案:

bins = [0, 0.5, 1, 1.5, 2, 2.5]

labels = ['{}_{}_count'.format(i, j) for i, j in zip(bins[:-1], bins[1:])] 
x= df.groupby(pd.cut(df['max'], bins, labels=labels)).size().rename_axis(None).to_frame().T
print (x)
   0_0.5_count  0.5_1_count  1_1.5_count  1.5_2_count  2_2.5_count
0            3            1            1            0            1