熊猫自助加入:在同一表格中合并/加入

时间:2019-10-07 00:10:45

标签: python pandas dataframe merge self-join

我已经在此处附加了数据。

Excel Data

我需要返回一个DataFrame,其中包含所有雇员的列表(EmployeeID,名字,中间名,姓氏)以及其经理的姓氏和名字。输出DataFrame中的列应为:EmployeeID,FirstName,MiddleName,LastName,ManagerFirstName,ManagerLastName。

提示:考虑到管理者本身就是雇员本身,所以考虑自己加入表格。

这是我到目前为止的代码,为我提供了重复的记录:

# Creating data frame from Excel File. Enter the appropriate file path
df = pd.read_excel(Employees)

df_new = df[['EmployeeID', 'ManagerID', 'FirstName', 'MiddleName', 'LastName']].copy()
df_new['ManagerID'] = pd.to_numeric(df_new['ManagerID'], errors='coerce').fillna(0)
# convert object to int64
df_new['ManagerID'] = df_new['ManagerID'].astype(np.int64)

result = df_new.merge(df_new, left_on='EmployeeID', right_on='ManagerID')

print(result.head())

在此方面的任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

我认为这会起作用

df = pd.DataFrame({"EmployeeID":[259,278,204,78,255],
                  "ManagerID":[278,204,78,255,259],
                  "FirstName":["ben","garret","gabe","reuben","gordon"],
                  "MiddleName":["T","R","B","H","L"],
                  "LastName":["miller","vargas","mares","dsa","hee"]})

df['ManagerID'] = pd.to_numeric(df['ManagerID'], errors='coerce').fillna(0)
df_ = df[["EmployeeID","FirstName","LastName"]]
df_ = df_.rename(columns={"EmployeeID":"ManagerID","FirstName":"ManagerFirstName","LastName":"ManagerLastName"})
out = pd.merge(df,df_,on=["ManagerID"],how="left")
out = out.drop(["ManagerID"],axis=1)