我正在尝试根据其他列和行中的数据搜索在数据框中创建新列。计算此类列值的最佳/禁忌方法是什么。
我尝试使用lambda和外部函数,但没有结果。
有人可以详细说明获得最终结果的方法以及哪种方法从计算时间开始是最佳的。
我们可以分配函数/ lambda来计算这些值吗?
我们能否以这种方式实现数据框架,使其保持对列中函数计算值的引用,而不是对计算值本身的引用?基于其他列/行中数据的动态结果。
data = {
'ID':[1, 2, 3, 4 ,5],
'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John']
}
df = pd.DataFrame(data)
Original DataFrame:
ID M_Name Name
0 1 Lui Andy
1 2 Lui Rob
2 3 Lui Tony
3 4 NoData John
4 5 John Lui
data_after = {
'ID':[1, 2, 3, 4 ,5],
'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John'],
'ID_by_M_Name':[5, 5, 5, 'NoData', '4']
}
df1 = pd.DataFrame(data_after)
Processed DataFrame:
ID ID_by_M_Name M_Name Name
0 1 5 Lui Andy
1 2 5 Lui Rob
2 3 5 Lui Tony
3 4 NoData NoData John
4 5 4 John Lui
I have tried two ways to get ID but not sure how to use them in assign
getID = lambda name: df.loc[df['Name'] == name]['ID'].iloc[0]
def mID(name):
return df.loc[df['Name'] == name]['ID'].iloc[0]
For each row we want to find ID of M_Name for specifc Name.
e.g. for Name='Andy' we have M_Name = 'Lui' and Lui's ID(5)
For Lui M_name is John and John's ID is 4
print(getID('Lui'))
print(mID('Lui'))
df['ID'] = df.assign(mID(df['M_Name']), axis=1 )
IndexError:单个位置索引器超出范围
答案 0 :(得分:1)
将Series.replace
或Series.map
与Series.fillna
一起使用:
df['ID_by_M_Name'] = df['M_Name'].replace(df.set_index('Name')['ID'])
#assign alternative
#df = df.assign(ID_by_M_Name = df['M_Name'].replace(df.set_index('Name')['ID']))
df['ID_by_M_Name'] = df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name'])
#assign alternative
#df=df.assign(ID_by_M_Name=df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name']))
print (df)
ID Name M_Name ID_by_M_Name
0 1 Andy Lui 5
1 2 Rob Lui 5
2 3 Tony Lui 5
3 4 John NoData NoData
4 5 Lui John 4
如果新列的重要位置请使用DataFrame.insert
:
df.insert(1, 'ID_by_M_Name', df['M_Name'].replace(df.set_index('Name')['ID']))
print (df)
ID ID_by_M_Name Name M_Name
0 1 5 Andy Lui
1 2 5 Rob Lui
2 3 5 Tony Lui
3 4 NoData John NoData
4 5 4 Lui John