使用其他数据框替换数据框中的文本值

时间:2019-05-01 09:17:22

标签: python python-3.x pandas dataframe

我有一个简单的数据框(df1),其中用替换函数替换值(请参见下文)。我希望不必从代码中更改要替换的项的名称,而是希望从excel工作表中完成此操作,在该工作表中,列或行都给出了应替换的不同名称。我会将Excel导入为数据框(df2)。我所缺少的只是将信息从df2转换为替换功能的脚本。

df1 = pd.DataFrame({'Product':['Tart', 'Cookie', 'Black'],
                   'Quantity': [1234, 4, 333]})

print(df1)
  Product  Quantity
0      Tart      1234
1      Cookie    4
2      Black     333

这是我到目前为止使用的

sales = sales.replace (["Tart","Tart2", "Cookie", "Cookie2"], "Tartlet")
sales = sales.replace (["Ham and cheese Sandwich" , "Chicken focaccia"], "Sandwich")

更换后

print(df1)
  Product  Quantity
0      Tartlet   1234
1      Tartlet    4
2      Black     333

这是从Excel文件导入数据框2后的样子(我很灵活地设计它)

df2 = pd.read_excel (setup_folder / "Product Replacements.xlsx", index_col= 0)

print (df2)
      Tartlet  Sandwich
0      Tart      Ham and cheese Sandwich
1      Tart2    Chicken Focaccia
2      Cookie2     nan

1 个答案:

答案 0 :(得分:1)

使用:

df2 = pd.DataFrame({'Tartlet':['Tart', 'Tart2', 'Cookie'],
                    'Sandwich': ['Ham and Cheese Sandwich', 'Chicken Focaccia', 'another']})

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in df2.items() for k in oldv}
print (d1)
{'Tart': 'Tartlet', 'Tart2': 'Tartlet', 'Cookie': 'Tartlet', 'Ham and Cheese Sandwich': 
 'Sandwich', 'Chicken Focaccia': 'Sandwich', 'another': 'Sandwich'}

df1['Product'] = df1['Product'].replace(d1)
#for improve performance
#df1['Product'] = df1['Product'].map(d1).fillna(df1['Product'])
print (df1)
   Product  Quantity
0  Tartlet      1234
1  Tartlet         4
2    Black       333