如何在分类数据框列上应用python lamda函数

时间:2018-08-15 16:56:00

标签: python pandas

我们如何在此分类数据框中应用lambda函数?请注意,成绩是分类的。我希望那些高于C的人能够“通过”。而是显示“失败”。

import pandas as pd
dfg = pd.DataFrame(['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D'],
                  index=['excellent', 'excellent', 'excellent', 'good', 'good', 'good', 'ok', 'ok', 'ok', 'poor', 'poor'])
dfg.rename(columns={0: 'Grades'}, inplace=True)
dfg['Grades'] = dfg['Grades'].astype('category',
                             categories=['D', 'D+', 'C-', 'C', 'C+', 'B-', 'B', 'B+', 'A-', 'A', 'A+'],
                             ordered=True)
def Assess(row):
     if row>'C':
        return 'Pass'
     return 'Fail'

dfg['Asses'] = dfg.apply(lambda x: Assess(x.Grades), axis=1)

dfg

查看结果

Grades  Asses
excellent   A+  Fail
excellent   A   Fail
excellent   A-  Fail
good    B+  Fail
good    B   Fail
good    B-  Fail
ok  C+  Pass
ok  C   Fail
ok  C-  Pass
poor    D+  Pass
poor    D   Pass

2 个答案:

答案 0 :(得分:4)

使用:

dfg['Assess'] = np.where(dfg['Grades']>'C','Pass','Fail')
dfg

输出:

         Grades Assess
excellent     A+  Pass
excellent      A  Pass
excellent     A-  Pass
good          B+  Pass
good           B  Pass
good          B-  Pass
ok            C+  Pass
ok             C  Fail
ok            C-  Fail
poor          D+  Fail
poor           D  Fail

答案 1 :(得分:2)

您使用apply的方式将字符串传递给函数,而不是绝对的东西。

相反,对系列本身使用比较操作,并允许熊猫处理其分类性质。

dfg.assign(Assess=dfg.Grades > 'C')

          Grades  Asses
excellent     A+   True
excellent      A   True
excellent     A-   True
good          B+   True
good           B   True
good          B-   True
ok            C+   True
ok             C  False
ok            C-  False
poor          D+  False
poor           D  False

您可以使用map跟进此操作,以通过/失败

dfg.assign(Asses=dfg.Grades.gt('C').map({True: 'Pass', False: 'Fail'}))

          Grades Asses
excellent     A+  Pass
excellent      A  Pass
excellent     A-  Pass
good          B+  Pass
good           B  Pass
good          B-  Pass
ok            C+  Pass
ok             C  Fail
ok            C-  Fail
poor          D+  Fail
poor           D  Fail

如果您确实想要lambda(我不想要),则需要创建一个字典,将字母等级映射回数字值。

m = dict(map(reversed, enumerate(dfg.Grades.cat.categories)))
dfg.assign(Asses=dfg.apply(lambda row: 'Pass' if m[row.Grades] > m['C'] else 'Fail', 1))

          Grades Asses
excellent     A+  Pass
excellent      A  Pass
excellent     A-  Pass
good          B+  Pass
good           B  Pass
good          B-  Pass
ok            C+  Pass
ok             C  Fail
ok            C-  Fail
poor          D+  Fail
poor           D  Fail