假设我想将以下数据分类为12类:
no. grades
0 9.08
1 8.31
2 7.42
3 7.42
4 7.42
5 7.46
6 9.67
7 11.77
8 8.81
9 6.44
10 9.40
11 9.06
12 10.52
13 6.19
14 5.04
15 5.04
16 9.44
17 5.87
18 2.67
19 6.99
20 9.08
21 6.64
22 4.83
23 4.47
24 6.61
25 6.61
26 7.42
27 6.42
28 10.00
29 9.11
可以这样做:
df.a[df.a <= 1 and df.a>0] = 1
df.a[df.a <= 2 and df.a>1] = 2
.
.
.
df.a[df.a <= 12 and df.a>11] = 12
还有其他方法可以将项目分类为具有恒定和相等间隔的类别吗?
P.S:
我的数据在这里,我想对其成绩列进行分类:
psechoice hscath grades faminc famsiz parcoll female black
0 1 0 9.08 62.50 5 0 0 0
1 1 0 8.31 42.50 4 0 1 0
2 1 0 7.42 62.50 4 0 1 0
3 1 0 7.42 62.50 4 0 1 0
4 1 0 7.42 62.50 4 0 1 0
5 1 0 7.46 12.50 2 0 1 0
6 1 0 9.67 30.00 5 0 0 0
7 0 0 11.77 42.50 4 0 0 0
8 1 0 8.81 17.50 3 0 1 0
9 1 0 6.44 42.50 6 0 0 0
10 1 0 9.40 30.00 5 1 0 0
11 1 0 9.06 62.50 6 0 0 0
12 0 0 10.52 62.50 3 0 0 0
13 1 0 6.19 62.50 2 0 1 0
14 1 0 5.04 42.50 6 0 1 0
15 1 0 5.04 42.50 6 0 1 0
16 0 0 9.44 22.50 2 0 1 0
17 1 1 5.87 87.50 5 1 0 0
18 1 1 2.67 62.50 4 0 0 0
19 1 1 6.99 42.50 5 0 0 0
20 1 1 9.08 150.00 4 1 1 0
21 1 0 6.64 42.50 9 0 1 0
22 1 1 4.83 0.50 4 1 0 0
23 1 1 4.47 62.50 3 0 1 0
24 1 1 6.61 87.50 6 1 0 0
25 1 1 6.61 87.50 6 1 0 0
26 1 1 7.42 42.50 4 1 0 0
27 1 1 6.42 87.50 5 1 0 0
28 1 0 10.00 8.75 4 1 1 0
29 1 0 9.11 22.50 3 0 0 1
答案 0 :(得分:5)
您可以使用pd.cut
为类别分配值:
import pandas as pd
df = pd.DataFrame(
{'grades': [9.08, 8.31, 7.42, 7.42, 7.42, 7.46, 9.67, 11.77,
8.81, 6.44, 9.40, 9.06, 10.52, 6.19, 5.04, 5.04, 9.44, 5.87,
2.67, 6.99, 9.08, 6.64, 4.83, 4.47, 6.61, 6.61, 7.42, 6.42,
10.0, 9.11],
'no.': range(30)})
df['category'] = pd.cut(df['grades'], bins=range(0, 13), labels=range(1, 13))
print(df)
产量
grades no. category
0 9.08 0 10
1 8.31 1 9
2 7.42 2 8
3 7.42 3 8
4 7.42 4 8
5 7.46 5 8
6 9.67 6 10
7 11.77 7 12
...
使用pd.cut(..., bins=range(0, 13))
,类别为
[(0, 1] < (1, 2] < (2, 3] < (3, 4] ... (8, 9] < (9, 10] < (10, 11] < (11, 12]]
请注意,间隔在左侧打开,在右侧打开。