获取存储在另一个pandas列中的给定值间隔的pandas列中的项目频率

时间:2017-06-03 08:49:30

标签: pandas pandas-groupby

我的数据框

class_lst =  ["B","A","C","Z","H","K","O","W","L","R","M","Y","Q","X","X","G","G","G","G","G"]
value_lst = [1,0.999986,1,0.999358,0.999906,0.995292,0.998481,0.388307,0.99608,0.99829,1,0.087298,1,1,0.999993,1,1,1,1,1]

df =pd.DataFrame(
    {'class': class_lst,
     'val': value_lst
    })

任何时间间隔的' val'在范围

ranges = np.arange(0.0, 1.1, 0.1)

我想得到' val'项目如下:

class range  frequency
A (0, 0.10]    0
A (0.10, 0.20]    0
A (0.20, 0.30]   0
...
A (0.90, 100]   1 
G (0, 0.10]    0
G (0.10, 0.20]    0
G (0.20, 0.30]   0
...
G (0.80, 0.90]    0
G (0.90, 100]   5
...

我试过

df.groupby(pd.cut(df.val, ranges)).count()

但输出看起来像

            class  val
val                   
(0, 0.1]        1    1
(0.1, 0.2]      0    0
(0.2, 0.3]      0    0
(0.3, 0.4]      1    1
(0.4, 0.5]      0    0
(0.5, 0.6]      0    0
(0.6, 0.7]      0    0
(0.7, 0.8]      0    0
(0.8, 0.9]      0    0
(0.9, 1]       18   18

并且与预期的

不匹配

1 个答案:

答案 0 :(得分:2)

这可能是一个好的开始:

df["range"] = pd.cut(df['val'], ranges)

       class       val       range
0      B  1.000000  (0.9, 1.0]
1      A  0.999986  (0.9, 1.0]
2      C  1.000000  (0.9, 1.0]
3      Z  0.999358  (0.9, 1.0]
4      H  0.999906  (0.9, 1.0]
5      K  0.995292  (0.9, 1.0]
6      O  0.998481  (0.9, 1.0]
7      W  0.388307  (0.3, 0.4]
8      L  0.996080  (0.9, 1.0]
9      R  0.998290  (0.9, 1.0]
10     M  1.000000  (0.9, 1.0]
11     Y  0.087298  (0.0, 0.1]
12     Q  1.000000  (0.9, 1.0]
13     X  1.000000  (0.9, 1.0]
14     X  0.999993  (0.9, 1.0]
15     G  1.000000  (0.9, 1.0]
16     G  1.000000  (0.9, 1.0]
17     G  1.000000  (0.9, 1.0]
18     G  1.000000  (0.9, 1.0]
19     G  1.000000  (0.9, 1.0]

然后

df.groupby(["class", "range"]).size()

    class  range     
A      (0.9, 1.0]    1
B      (0.9, 1.0]    1
C      (0.9, 1.0]    1
G      (0.9, 1.0]    5
H      (0.9, 1.0]    1
K      (0.9, 1.0]    1
L      (0.9, 1.0]    1
M      (0.9, 1.0]    1
O      (0.9, 1.0]    1
Q      (0.9, 1.0]    1
R      (0.9, 1.0]    1
W      (0.3, 0.4]    1
X      (0.9, 1.0]    2
Y      (0.0, 0.1]    1
Z      (0.9, 1.0]    1

这将为每个班级及其频率提供正确的箱柜。