我正在尝试比较两个数据框,寻找主题之间的不同值。其中一个数据帧具有多索引列,其中包含比较所需的数据。下面是一个示例
weight = [1,5,2,4]
price = [2,6,3,5]
item = ['A','B','A','B']
date = ['20-12-2020', '21-12-2020', '20-12-2020', '21-12-2020']
DF2 = pd.DataFrame({'Date':date, 'weight':weight, 'price':price, 'item':item})
tuples = (['A', 'weight'], ['A', 'price'], ['B', 'weight'], ['B', 'price'])
index = pd.MultiIndex.from_tuples(tuples)
DF1 = pd.DataFrame(columns = index)
DF1['A','weight'] = [1,2]
DF1['A', 'price'] = [2,3]
DF1['B', 'weight'] = [5,4]
DF1['B', 'price'] = [6,5]
DF1.rename(index={0:'20-12-2020', 1:'21-12-2020'})
目的是找出 DF1 和 DF2 之间给定项目和日期的重量和价格差异,但鉴于 DF1 中的多重索引,我不知道如何继续,因为它包含该项目,所以它也包含必要的数据。
答案 0 :(得分:0)
最简单的方法是:
DF1[('C','diff weight')] = (DF1[('A','weight')] - DF1[('B','weight')])
DF1[('C','diff price')] = (DF1[('A','price')] - DF1[('B','price')])
这是
A B C
weight price weight price diff weight diff price
0 1 2 5 6 -4 -4
1 2 3 4 5 -2 -2
替代
DF1 = DF1['A'] - DF1['B']
DF1.columns = pd.MultiIndex.from_tuples([('diff',col) for col in DF1.columns])
至善至美
diff
weight price
0 -4 -4
1 -2 -2