我希望使用apply在其他列值的基础上在pandas数据框中创建新列。我收到此错误,但我不明白为什么:
File "C:\dev\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2448, in _setitem_array
raise ValueError('Columns must be same length as key')
ValueError: Columns must be same length as key
我误解了apply函数吗?您可以使用一个Apply调用来更新/创建多个列吗?
这是我的示例数据:
import pandas as pd
x = pd.DataFrame({'VP': ['Brian', 'Sarah', 'Sarah', 'Brian', 'Sarah'],
'Director': ['Jim', 'Ian', 'Ian', 'Jim', 'Jerry'],
'Requester': ['Kelly', 'Dave', 'Jordan', 'Matt', 'Rob'],
'VP from Query': ['Jordan', 'Justin', 'Sarah', 'Brian', 'Sarah'],
'Director from Query': ['Other', 'Other', 'Ian', 'Jim', 'Jerry'],
'Requester from Query': ['Kelly', 'Dave', 'Jordan', 'Matt', 'Rob']
})
x = x[['VP', 'Director', 'Requester', 'VP from Query', 'Director from Query', 'Requester from Query']]
def set_suggested_hierarchy(row):
if row['VP'] != row['VP from Query']:
return row[['VP', 'Director']]
else:
return row[['VP from Query', 'Director from Query']]
x[['Suggested VP', 'Suggested Director']] = x.apply(lambda row: set_suggested_hierarchy(row), axis=1)
非常感谢您
答案 0 :(得分:1)
基本上,我需要更改lambda函数以返回序列:
def set_suggested_hierarchy(row):
if row['VP'] != row['VP from Query']:
return pd.Series([row['VP'], row['Director']])
else:
return pd.Series([row['VP from Query'], row['Director from Query']])
答案 1 :(得分:0)
一种解决方案是返回数据框的整个行,因为您正在将此函数应用于整个数据框:
def set_suggested_hierarchy(row):
if row['VP'] != row['VP from Query']:
row['Suggested VP'] = row['VP']
row['Suggested Director'] = row['Director']
else:
row['Suggested VP'] = row['VP from Query']
row['Suggested Director'] = row['Director from Query']
return row
x = x.apply(lambda row: set_suggested_hierarchy(row), axis=1)
答案 2 :(得分:0)
我认为您应该一起摆脱apply(axis=1)
。看来您的逻辑可以实现为:
import numpy as np
x['Suggested VP'] = x.VP
x['Suggested Director'] = np.where(x.VP != x['VP from Query'],
x.Director, x['Director from Query'])