我有两个熊猫数据帧,如下所示:
import pandas as pd
import numpy as np
import string
size = 5
student_names = [''.join(np.random.choice(list(string.ascii_lowercase), size=4)) for i in range(size)]
marks = list(np.random.randint(50, high=100, size=size))
df1 = pd.DataFrame({'Student Names': student_names, 'Total': marks})
grade_leters = ['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D',
'D-', 'F']
grade_minimum_value = [95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 0]
df2 = pd.DataFrame({'Grade Letters': grade_leters, 'Minimums': grade_minimum_value})
df1
Student Names Total
0 cjpv 83
1 iywm 98
2 jhhb 87
3 qwau 70
4 ppai 82
df2
Grade Letters Minimums
0 A+ 95
1 A 90
2 A- 85
3 B+ 80
4 B 75
5 B- 70
6 C+ 65
7 C 60
8 C- 55
9 D+ 50
10 D 45
11 D- 40
12 F 0
我想将等级字母作为新列提供给df1
。例如,总分为83的学生cjpv
将收到B+
的成绩字母,因为83在80(含)和85(不含)之间。
所需的输出如下。
Student Names Total Grade
0 cjpv 83 B+
1 iywm 98 A+
2 jhhb 87 A-
3 qwau 70 B-
4 ppai 82 B+
先谢谢了。如果有与此类似的问题,我深表歉意。但是,经过长时间的搜索,我找不到一个问题。
答案 0 :(得分:3)
将cut
与动态值容器和df2
列中的标签一起使用,还为左侧封闭的容器添加right=False
:
np.random.seed(123)
size = 20
student_names = [''.join(np.random.choice(list(string.ascii_lowercase), size=4)) for i in range(size)]
marks = list(np.random.randint(50, high=100, size=size))
df1 = pd.DataFrame({'Student Names': student_names, 'Total': marks})
grade_leters = ['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D',
'D-', 'F']
grade_minimum_value = [95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 0]
df2 = pd.DataFrame({'Grade Letters': grade_leters, 'Minimums': grade_minimum_value})
df2 = df2.sort_values('Minimums')
df1['new'] = pd.cut(df1['Total'],
bins=df2['Minimums'].tolist() + [np.inf],
labels=df2['Grade Letters'],
right=False)
print (df1)
Student Names Total new
0 nccg 70 B-
1 rtkz 99 A+
2 wbar 62 C
3 pjao 68 C+
4 apzt 67 C+
5 oeaq 51 D+
6 erxd 94 A
7 cuhc 91 A
8 upyq 98 A+
9 hjdu 77 B
10 gbvw 99 A+
11 cbmi 72 B-
12 dkfa 53 D+
13 lckw 53 D+
14 nsep 61 C
15 lmug 71 B-
16 ntqg 75 B
17 ouhl 89 A-
18 whbl 91 A
19 fxzs 84 B+
像@Henry Yik评论的一样,可以在merge_asof
中使用:
df1 = pd.merge_asof(df1.sort_values('Total'), df2, left_on='Total', right_on='Minimums')
print (df1)
Student Names Total new Grade Letters Minimums
0 oeaq 51 D+ D+ 50
1 lckw 53 D+ D+ 50
2 dkfa 53 D+ D+ 50
3 nsep 61 C C 60
4 wbar 62 C C 60
5 apzt 67 C+ C+ 65
6 pjao 68 C+ C+ 65
7 nccg 70 B- B- 70
8 lmug 71 B- B- 70
9 cbmi 72 B- B- 70
10 ntqg 75 B B 75
11 hjdu 77 B B 75
12 fxzs 84 B+ B+ 80
13 ouhl 89 A- A- 85
14 whbl 91 A A 90
15 cuhc 91 A A 90
16 erxd 94 A A 90
17 upyq 98 A+ A+ 95
18 rtkz 99 A+ A+ 95
19 gbvw 99 A+ A+ 95