Question

我正在尝试使用bin将value_counts应用于以下数据框

df2 = pd.DataFrame(np.random.randint(0,100,size=(1000, 4)), columns=list('ABCD'))
df2.apply(pd.value_counts, normalize=True, bins=[0,25,50,75,101]).sort_values(by=['A'], ascending=False)

但是，当我这样做时，我收到以下错误：

ValueError：无法将shape（5）中的输入数组广播为shape（4）

当我不想使用垃圾箱时，代码工作正常。

Answer 1

看起来像是bug。

但对我来说，sort_index使用列表理解和concat：

L = [pd.value_counts(df2[x], normalize=True, bins=[0,25,50,75,101]).sort_index() for x in df2]
b = pd.concat(L, 1).sort_values(by=['A'], ascending=False)

或者在sort_index中将value_counts添加到apply：

b=df2.apply(lambda x: pd.value_counts(x, normalize=True, bins=[0,25,50,75,101]).sort_index())

print (b)
                    A      B      C      D
(-0.001, 25.0]  0.263  0.273  0.278  0.259
(25.0, 50.0]    0.251  0.254  0.234  0.255
(50.0, 75.0]    0.250  0.257  0.240  0.249
(75.0, 101.0]   0.236  0.216  0.248  0.237

Answer 2

这是一种解决方法，而不是确切的答案：

In [174]: pd.concat([df2[i].value_counts(normalize=True,bins=[0,25,50,75,101]).sort_index() for i in df2.columns],axis=1)
Out[174]: 
                    A      B      C      D
(-0.001, 25.0]  0.253  0.231  0.238  0.270
(25.0, 50.0]    0.263  0.246  0.260  0.248
(50.0, 75.0]    0.264  0.278  0.239  0.241
(75.0, 101.0]   0.220  0.245  0.263  0.241

使用.apply时的尺寸问题（value_counts，bin = x）

2 个答案: