我正在尝试创建一个函数,该函数将根据其他列的条件创建新列。当我只传递一个变量时,该函数工作正常,但当需要两个变量时,它不起作用。我想要做的一个例子是:
### create a function called name
def name(ID, NAME):
if (ID == 1 ):
return "First"
elif (ID == 2):
return "Second"
elif (ID == 3):
return "Third"
elif (ID == 4 and NAME = “Four” ):
return "Fourth"
### apply function to dataset and view results
dataset["NAME"].apply(name).head(100)
我的大多数新列值只能通过查看一个变量来获得,但少数需要两个。任何人都可以提供如何在python中实现这一目标的方向吗?在R中我使用了dplyr中的case_when函数,但我似乎没有发现python支持case语句
答案 0 :(得分:2)
您可以将数据帧的整行传递给apply中带有axis = 1参数的函数,然后您可以访问函数中的部分行,如下所示:
import pandas as pd
import numpy as np
def nameme(row):
if (row.ID == 1 ):
return "First"
elif (row.ID == 2):
return "Second"
elif (row.ID == 3):
return "Third"
elif (row.ID == 4 and row.Name == 'Four' ):
return "Fourth"
dataset = pd.DataFrame({'ID':[0,1,2,3,4,5],'Name':['Four']*6})
dataset.apply(nameme, axis=1)
输出:
0 None
1 First
2 Second
3 Third
4 Fourth
5 None
dtype: object
答案 1 :(得分:1)
我修改你的功能并创建一个玩具数据
def name(ID, NAME):
if ID == 1 :
return "First"
elif ID == 2:
return "Second"
elif ID == 3:
return "Third"
elif ID == 4 and NAME == "Four" :
return "Fourth"
dataset=pd.DataFrame({'ID':[1,2,3,4,4],'NAME':[1,2,3,4,'Four']})
dataset.apply(lambda x: name(x['ID'], x['NAME']), axis=1)
Out[741]:
0 First
1 Second
2 Third
3 None# return None cause , it did not match all the condition
4 Fourth
dtype: object
答案 2 :(得分:0)
温家宝和斯科特波士顿有两个很好的答案。以下是我处理您DataFrame
未必拥有您正在寻找的专栏的情况。因此,不要向您抛出错误,而是返回None
:
def name(df):
ID = df.get('ID') # returns None if your DataFrame doesn't contain an 'ID' column
NAME = df.get('NAME') # returns None if your DataFrame doesn't contain a 'NAME' column
if (ID == 1 ):
return "First"
elif (ID == 2):
return "Second"
elif (ID == 3):
return "Third"
elif (ID == 4 and NAME == "Four" ):
return "Fourth"
data = pd.DataFrame({'ID':[1, 2, 3, 4, 4, 5], 'NAME':[1, 2, 3, 4, 'Four', 'Four']})
data['RESULT'] = data.apply(name, axis=1)
# ID NAME RESULT
# 0 1 1 First
# 1 2 2 Second
# 2 3 3 Third
# 3 4 4 None
# 4 4 Four Fourth
# 5 5 Four None