我想从我的pandas数据帧创建直方图。我有1列,我保存百分比值。我使用了value_counts(),但是我有太多的百分比值。 例如:
0.752 1
0.769 2
0.800 1
0.823 1
...
80.365 1
84.000 1
84.615 1
85.000 10
85.714 1
我需要以相同的速率对这些值进行分组。例如5%。 (0 - 4,999,5,000 - 9,999,...)我想要这个结果:
(实施例)
0 - 4,999 24
5 - 9,999 12
10 - 14,999 30
...
答案 0 :(得分:1)
您可以按pd.cut()方法的结果对数据进行分组:
In [38]: df
Out[38]:
value count
0 0.752 1
1 11.769 3
2 22.800 4
3 33.823 5
4 55.365 1
5 84.000 1
6 84.615 1
7 85.000 10
8 99.714 1
In [39]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().fillna(0)
Out[39]:
value
(0, 5] 1.0
(5, 10] 0.0
(10, 15] 3.0
(15, 20] 0.0
(20, 25] 4.0
(25, 30] 0.0
(30, 35] 5.0
(35, 40] 0.0
(40, 45] 0.0
(45, 50] 0.0
(50, 55] 0.0
(55, 60] 1.0
(60, 65] 0.0
(65, 70] 0.0
(70, 75] 0.0
(75, 80] 0.0
(80, 85] 12.0
(85, 90] 0.0
(90, 95] 0.0
(95, 100] 1.0
Name: count, dtype: float64
或者你可以放弃NaN:
In [40]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().dropna()
Out[40]:
value
(0, 5] 1.0
(10, 15] 3.0
(20, 25] 4.0
(30, 35] 5.0
(55, 60] 1.0
(80, 85] 12.0
(95, 100] 1.0
Name: count, dtype: float64
说明:
In [41]: pd.cut(df.value, bins=np.linspace(0, 100, 21))
Out[41]:
0 (0, 5]
1 (10, 15]
2 (20, 25]
3 (30, 35]
4 (55, 60]
5 (80, 85]
6 (80, 85]
7 (80, 85]
8 (95, 100]
Name: value, dtype: category
Categories (20, object): [(0, 5] < (5, 10] < (10, 15] < (15, 20] ... (80, 85] < (85, 90] < (90, 95] < (95, 100]]