如何根据第一个数据帧中的值替换第二个数据帧中的值

时间:2017-01-10 23:16:44

标签: python pandas

我有两个数据帧,即df和df1,

df

Product_name     Name        City
Rice             Chetwynd    Chetwynd, British Columbia, Canada
Wheat            Yuma        Yuma, AZ, United States
Sugar            Dochra      Singleton, New South Wales, Australia
Milk             India       Hyderabad, India

df1

Product_ID Unique_ID Origin_From Deliver_To
231        125         Sugar         Milk
598        125         Milk          Wheat
786        125         Rice          Sugar    
568        125         Sugar         Wheat
122        125         Wheat         Rice
269        125         Milk          Wheat

最终输出(df2):获取"Origin_From""Deliver_To"df1值的值,然后搜索df中的每个值,如果找到则替换"Origin_From" "Deliver_To"中的df1df[city] + df[Origin_From/Origin_To]值。输出(df2)将如下所示。

DF2:

Product_ID  unique_ID   Origin_From                                         Deliver_To
231         125         Singleton, New South Wales, Australia, (Sugar)      Hyderabad, India, (Milk)
598         125         Hyderabad, India, (Milk)                            Yuma, AZ, United States, (Wheat)
786         125         Chetwynd, British Columbia, Canada, (Rice)          Singleton, New South Wales, Australia, (Sugar)
568         125         Singleton, New South Wales, Australia, (Sugar)      Yuma, AZ, United States, (Wheat)
122         125         Yuma, AZ, United States, (Wheat)                    Chetwynd, British Columbia, Canada, (Rice)
269         125         Hyderabad, India, (Milk)                            Yuma, AZ, United States, (Wheat)

我正在苦苦挣扎,所以在正确方向上的几个推动真的会有所帮助。

提前致谢。

1 个答案:

答案 0 :(得分:2)

设置

from io import StringIO
import pandas as pd

df_txt = """Product_name     Name        City
Rice             Chetwynd    Chetwynd, British Columbia, Canada
Wheat            Yuma        Yuma, AZ, United States
Sugar            Dochra      Singleton, New South Wales, Australia
Milk             India       Hyderabad, India"""

df1_txt = """Product_ID  Unique_ID  Origin_From  Deliver_To
231        125         Sugar         Milk
598        125         Milk          Wheat
786        125         Rice          Sugar    
568        125         Sugar         Wheat
122        125         Wheat         Rice
269        125         Milk          Wheat"""

df = pd.read_csv(StringIO(df_txt), sep='\s{2,}', engine='python')
df1 = pd.read_csv(StringIO(df1_txt), sep='\s{2,}', engine='python')

解决方案

选项1

m = df.set_index('Product_name').City

df2 = df1.copy()
df2.Origin_From = df1.Origin_From.map(m) + ', (' + df1.Origin_From + ')'
df2.Deliver_To = df1.Deliver_To.map(m)+ ', (' + df1.Deliver_To + ')'

df

选项2

m = df.set_index('Product_name').City

c = ['Origin_From', 'Deliver_To']
fnt = df1[c].stack()
df2 = df1.drop(c, 1).join(fnt.map(m).add(fnt.apply(', ({})'.format)).unstack())

选项3
使用merge

c = ['Origin_From', 'Deliver_To']
ds = df1[c].stack().to_frame('Product_name')
ds['City'] = ds.merge(df)['City'].values

df2 = df1.drop(c, 1).join(ds.City.add(', (').add(ds.Product_name).add(')').unstack())

enter image description here

对选项3的更深入解释

  • 为方便起见,将目标列分配给变量c
  • 使用stack将2列数据帧转换为具有多索引
  • 的系列对象
  • 预计我要合并,我使用to_frame将系列对象转换为单列数据帧。 pd.merge仅适用于数据帧`
  • 更多的期待,我将单列的名称传递给to_frame方法。这是将要合并的重合列名称。
  • 添加一个名为'City'的列,该列是合并的结果。我将值添加到具有values属性的列中,以便忽略生成的合并的索引,并仅关注结果值。
  • ds现在有我想要的索引在它的第一级。我在做一些方便的字符串操作时留下堆叠,然后我unstack。在此表单中,索引是对齐的,并且可以利用join

我希望这很清楚。