当前,我面临一个有关如何根据变量中存在的值将dataframe
分组到不同bin中的问题。
以下是我的数据
df[['col','val']]
Out[490]:
col val
0 65 0
1 6 0
2 23 0
3 6 0
4 19 0
5 10 0
6 30 0
7 64 0
8 4 0
9 3 0
10 6 0
11 5 0
12 9 0
13 10 0
14 11 0
15 1 0
16 0 0
17 0 1
18 4 0
19 2 0
我使用cut的输出给出了这些输出
df['bins'] = pd.cut(df['col'], binsize)
bins val
0 (-0.065, 13.0] 1
1 (13.0, 26.0] 0
2 (26.0, 39.0] 0
4 (52.0, 65.0] 0
我希望得到的是这些输出
col Value
(0, 2] 1
(3, 5] 0
(6, 9] 0
(10, 19] 0
(23, 65] 0
答案 0 :(得分:0)
一种解决方案是将指定的bin作为IntervalIndex传递给pd.cut
:
# default is closed='right', but this would miss the first row
# of your expected output of (0, 2] 1
bins = pd.IntervalIndex.from_tuples([(0, 2),
(3, 5),
(6, 9),
(10, 19),
(23, 65)],
closed='left')
df['bins'] = pd.cut(df['col'], bins=bins)
df
col val bins
0 65 0 NaN
1 6 0 [6.0, 9.0)
2 23 0 [23.0, 65.0)
3 6 0 [6.0, 9.0)
4 19 0 NaN
5 10 0 [10.0, 19.0)
6 30 0 [23.0, 65.0)
7 64 0 [23.0, 65.0)
8 4 0 [3.0, 5.0)
9 3 0 [3.0, 5.0)
10 6 0 [6.0, 9.0)
11 5 0 NaN
12 9 0 NaN
13 10 0 [10.0, 19.0)
14 11 0 [10.0, 19.0)
15 1 0 [0.0, 2.0)
16 0 0 [0.0, 2.0)
17 0 1 [0.0, 2.0)
18 4 0 [3.0, 5.0)
19 2 0 NaN
# Get something close to expected output: for each
# unique bin, take the maximum value
(df[['bins', 'val']].dropna()
.groupby('bins')
.max()
.reset_index())
bins val
0 [0, 2) 1
1 [3, 5) 0
2 [6, 9) 0
3 [10, 19) 0
4 [23, 65) 0
答案 1 :(得分:0)
当前,我正在使用以下SAS代码对其进行装箱,但希望将其转换为python
&allweights = count of of rows in dataset
weight = 1;
binsize = 5;
data temp;
set temp nobs=numobs;
by dataset;
retain group nn;
nn = sum(nn,weight);
if first.&x then do;
group = floor(nn*binsize/(&allweights+1));
end;
run;