从R到Python的case_when函数

时间:2019-02-12 15:20:56

标签: python pandas dataframe data-analysis

我如何在python代码中实现R的case_when函数?

这是R的case_when函数:

https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/case_when

作为最小的工作示例,假设我们具有以下数据框(后接python代码):

import pandas as pd
import numpy as np

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df

假设我们要创建一个名为“老人”的新列,该列查看“年龄”列并执行以下操作:

if age < 10 then baby
 if age >= 10 and age < 20 then kid 
if age >=20 and age < 30 then young 
if age >= 30 and age < 50 then mature 
if age >= 50 then grandpa 

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:5)

您要使用np.select

conditions = [(df['age'].lt(10)), 
              (df['age'].ge(10) & df['age'].lt(20)), 
              (df['age'].ge(20) & df['age'].lt(30)), 
              (df['age'].ge(30) & df['age'].lt(50)), 
              (df['age'].ge(50))]
choices = ['baby', 'kid', 'young', 'mature', 'grandpa']

df['elderly'] = np.select(conditions, choices)

df
    name  age  preTestScore  postTestScore  elderly
0  Jason   42             4             25   mature
1  Molly   52            24             94  grandpa
2   Tina   36            31             57   mature
3   Jake   24             2             62    young
4    Amy   73             3             70  grandpa

conditionschoices列表的长度必须相同。

答案 1 :(得分:4)

np.select 很棒,因为它是一种根据条件为选择列表中的元素赋值的通用方法。

然而,对于 OP 试图解决的特定问题,有一种简洁的方法可以通过大熊猫的 cut method 实现相同的目标。


bin_cond = [-np.inf, 10, 20, 30, 50, np.inf]            # think of them as bin edges
bin_lab = ["baby", "kid", "young", "mature", "grandpa"] # the length needs to be len(bin_cond) - 1
df["elderly2"] = pd.cut(df["age"], bins=bin_cond, labels=bin_lab)

#     name  age  preTestScore  postTestScore  elderly elderly2
# 0  Jason   42             4             25   mature   mature
# 1  Molly   52            24             94  grandpa  grandpa
# 2   Tina   36            31             57   mature   mature
# 3   Jake   24             2             62    young    young
# 4    Amy   73             3             70  grandpa  grandpa