从数据框中查找/搜索值以创建新列

时间:2019-09-11 11:00:34

标签: python pandas dataframe series

我正在尝试根据其他列和行中的数据搜索在数据框中创建新列。计算此类列值的最佳/禁忌方法是什么。

我尝试使用lambda和外部函数,但没有结果。

  1. 有人可以详细说明获得最终结果的方法以及哪种方法从计算时间开始是最佳的。

  2. 我们可以分配函数/ lambda来计算这些值吗?

  3. 我们能否以这种方式实现数据框架,使其保持对列中函数计算值的引用,而不是对计算值本身的引用?基于其他列/行中数据的动态结果。

data = { 
            'ID':[1, 2, 3, 4 ,5],                  
            'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
            'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John']
             } 

df = pd.DataFrame(data) 

Original DataFrame:
    ID  M_Name  Name
0   1     Lui  Andy
1   2     Lui   Rob
2   3     Lui  Tony
3   4  NoData  John
4   5    John   Lui

data_after = { 
            'ID':[1, 2, 3, 4 ,5],                  
            'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
            'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John'],    
            'ID_by_M_Name':[5, 5, 5, 'NoData', '4']
             } 

df1 = pd.DataFrame(data_after)

Processed DataFrame:
    ID ID_by_M_Name  M_Name  Name
0   1          5     Lui  Andy
1   2          5     Lui   Rob
2   3          5     Lui  Tony
3   4     NoData  NoData  John
4   5          4    John   Lui

I have tried two ways to get ID but not sure how to use them in assign

getID = lambda name: df.loc[df['Name'] == name]['ID'].iloc[0]

def mID(name):
    return df.loc[df['Name'] == name]['ID'].iloc[0]

For each row we want to find ID of M_Name for specifc Name. 
e.g. for Name='Andy' we have M_Name = 'Lui' and Lui's ID(5)
For Lui M_name is John and John's ID is 4

print(getID('Lui'))
print(mID('Lui'))

df['ID'] = df.assign(mID(df['M_Name']), axis=1 )

IndexError:单个位置索引器超出范围

1 个答案:

答案 0 :(得分:1)

Series.replaceSeries.mapSeries.fillna一起使用:

df['ID_by_M_Name'] = df['M_Name'].replace(df.set_index('Name')['ID'])
#assign alternative
#df = df.assign(ID_by_M_Name = df['M_Name'].replace(df.set_index('Name')['ID']))
df['ID_by_M_Name'] = df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name'])
#assign alternative
#df=df.assign(ID_by_M_Name=df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name']))

print (df)

   ID  Name  M_Name ID_by_M_Name
0   1  Andy     Lui            5
1   2   Rob     Lui            5
2   3  Tony     Lui            5
3   4  John  NoData       NoData
4   5   Lui    John            4

如果新列的重要位置请使用DataFrame.insert

df.insert(1, 'ID_by_M_Name', df['M_Name'].replace(df.set_index('Name')['ID']))
print (df)

   ID ID_by_M_Name  Name  M_Name
0   1            5  Andy     Lui
1   2            5   Rob     Lui
2   3            5  Tony     Lui
3   4       NoData  John  NoData
4   5            4   Lui    John