有没有一种pythonic的方法可以遍历两个数据帧并比较它们的行?

时间:2019-11-21 17:20:54

标签: python pandas dataframe

给出以下两个数据框:

df1#从excel电子表格中读取

data1 = {'ID':['1','2'],
         'Prod Family Desc':['Install','Maintenance'], 'Prod Family Code':['',''], 
         'Prod Type Desc':['Installation Serice','Maintenance Service'],'Prod Type Code':['',''],
        } 
df1 = pd.DataFrame(data1) 
print(df1)

结果df1:

  ID Prod Family Desc Prod Family Code       Prod Type Desc Prod Type Code
0  1          Install                   Installation Serice
1  2      Maintenance                   Maintenance Service

df2#这是SQL查询的结果

data2 = {'Prod Class':['F','F','T','T'],
        'Prod Desc':['Install','Maintenance','Installation Serice','Maintenance Service'],'Prod Code':['2525','2534','H123','H321']
        }

df2 = pd.DataFrame(data2) 
print(df2)

结果df2:

  Prod Class            Prod Desc Prod Code
0          F              Install      2525
1          F          Maintenance      2534
2          T  Installation Serice      H123
3          T  Maintenance Service      H321

从df2中分配产品系列代码产品类型代码 的最佳方法是什么到df1上的 产品系列代码产品类型代码

我正在这样做:

stype = df2.loc[df2['Prod Class'] == "T"] 

family = df2.loc[df2['Prod Class'] == "F"]

for i, concaterow in df1.iterrows():
    for j, styp in stype.iterrows():

        if (concaterow['Prod Type Desc'] == styp['Prod Desc']):
            df1.loc[i,'Prod Type Code'] = styp['Prod Code']

    for j, scat in family.iterrows():
        if (concaterow['Prod Family Desc'] == scat['Prod Desc']):
            df1.loc[i,'Prod Family Code'] = scat['Prod Code']

print(df1)

结果如预期:

  ID Prod Family Desc Prod Family Code       Prod Type Desc Prod Type Code
0  1          Install             2525  Installation Serice           H123
1  2      Maintenance             2534  Maintenance Service           H321

对这种操作有任何Python方式吗?

**编辑@FatihAkici问题的答案。

@FatihAkici-因为df2是SQL查询的结果,所以我的预期结果是插入表中的最新值。因此,给定df2如下:

data2 = {'Prod Class':['F','F','F','T','T'], 'Prod Desc':['Install','Maintenance','Install','Installation Serice','Maintenance Service'],'Prod Code':['2525','2534','2536','H123','H321'] } ```

The expected result would be: 
```ID Prod Family Desc Prod Family Code Prod Type Desc Prod Type Code 
 0 1  Install          2536             Installation Serice H123 
 1 2 Maintenance       2534             Maintenance Service H321 

2 个答案:

答案 0 :(得分:1)

您可以结合使用pd.DataFrame.assignpd.DataFrame.merge

df1.assign(**{
    "Prod Family Code" : df1.merge(df2, left_on = "Prod Family Desc", right_on = "Prod Desc")["Prod Code"],
    "Prod Type Code"   : df1.merge(df2, left_on = "Prod Type Desc", right_on = "Prod Desc")["Prod Code"]})
  

在您的示例中,数据框df1包含2个空列Prod Family CodeProd Type Code,它们接收结果,但这不是此方法的要求

答案 1 :(得分:0)

我相信合并可以满足您的需求

df1.merge(df2, how='left', left_on=['Prod Family Desc'], right_on=['Prod Desc'])