我有两个数据帧,即df和df1,
df
:
Product_name Name City
Rice Chetwynd Chetwynd, British Columbia, Canada
Wheat Yuma Yuma, AZ, United States
Sugar Dochra Singleton, New South Wales, Australia
Milk India Hyderabad, India
df1
:
Product_ID Unique_ID Origin_From Deliver_To
231 125 Sugar Milk
598 125 Milk Wheat
786 125 Rice Sugar
568 125 Sugar Wheat
122 125 Wheat Rice
269 125 Milk Wheat
最终输出(df2):获取"Origin_From"
中"Deliver_To"
和df1
值的值,然后搜索df
中的每个值,如果找到则替换"Origin_From"
"Deliver_To"
中的df1
和df[city] + df[Origin_From/Origin_To]
值。输出(df2)将如下所示。
DF2:
Product_ID unique_ID Origin_From Deliver_To
231 125 Singleton, New South Wales, Australia, (Sugar) Hyderabad, India, (Milk)
598 125 Hyderabad, India, (Milk) Yuma, AZ, United States, (Wheat)
786 125 Chetwynd, British Columbia, Canada, (Rice) Singleton, New South Wales, Australia, (Sugar)
568 125 Singleton, New South Wales, Australia, (Sugar) Yuma, AZ, United States, (Wheat)
122 125 Yuma, AZ, United States, (Wheat) Chetwynd, British Columbia, Canada, (Rice)
269 125 Hyderabad, India, (Milk) Yuma, AZ, United States, (Wheat)
我正在苦苦挣扎,所以在正确方向上的几个推动真的会有所帮助。
提前致谢。
答案 0 :(得分:2)
from io import StringIO
import pandas as pd
df_txt = """Product_name Name City
Rice Chetwynd Chetwynd, British Columbia, Canada
Wheat Yuma Yuma, AZ, United States
Sugar Dochra Singleton, New South Wales, Australia
Milk India Hyderabad, India"""
df1_txt = """Product_ID Unique_ID Origin_From Deliver_To
231 125 Sugar Milk
598 125 Milk Wheat
786 125 Rice Sugar
568 125 Sugar Wheat
122 125 Wheat Rice
269 125 Milk Wheat"""
df = pd.read_csv(StringIO(df_txt), sep='\s{2,}', engine='python')
df1 = pd.read_csv(StringIO(df1_txt), sep='\s{2,}', engine='python')
选项1
m = df.set_index('Product_name').City
df2 = df1.copy()
df2.Origin_From = df1.Origin_From.map(m) + ', (' + df1.Origin_From + ')'
df2.Deliver_To = df1.Deliver_To.map(m)+ ', (' + df1.Deliver_To + ')'
df
选项2
m = df.set_index('Product_name').City
c = ['Origin_From', 'Deliver_To']
fnt = df1[c].stack()
df2 = df1.drop(c, 1).join(fnt.map(m).add(fnt.apply(', ({})'.format)).unstack())
选项3
使用merge
c = ['Origin_From', 'Deliver_To']
ds = df1[c].stack().to_frame('Product_name')
ds['City'] = ds.merge(df)['City'].values
df2 = df1.drop(c, 1).join(ds.City.add(', (').add(ds.Product_name).add(')').unstack())
对选项3的更深入解释
c
stack
将2列数据帧转换为具有多索引to_frame
将系列对象转换为单列数据帧。 pd.merge
仅适用于数据帧`to_frame
方法。这是将要合并的重合列名称。'City'
的列,该列是合并的结果。我将值添加到具有values
属性的列中,以便忽略生成的合并的索引,并仅关注结果值。ds
现在有我想要的索引在它的第一级。我在做一些方便的字符串操作时留下堆叠,然后我unstack
。在此表单中,索引是对齐的,并且可以利用join
我希望这很清楚。