在Pandas中比较基于多索引的列

时间:2017-11-22 11:41:30

标签: python-3.x pandas dataframe

我的数据框df包含行标签ColourClothesWeek以及两列diffdiff_two

MultiIndex(levels=[['Green', 'Yellow', 'Red', 'Blue', 'Black'], ['tshirt', 'jeans', 'pants', 'dress'], ['2017_46']],
           names=['Colour', 'Clothes', 'Week'],
           sortorder=0)

迭代行并比较包含字符串的diffdiff_two的最简单方法是什么?

我的想法是在数据框上使用循环

for colour in colour_list:
    for clothes in clothes_list:
        if colour in df.index.level[0] and clothes in df.index.level[1]:
            if df.loc[colour, clothes]['diff'] = df.loc[colour, clothes]['diff2']: do something

这是错误的,因为if条件总是为真,即它不会将索引看作元组,即(颜色,衣服)。

将两列与多指数进行比较的最佳方法是什么?

谢谢!

使用示例更新问题:

Colour    Clothes    Week    diff     diff1
Green     Jeans      50      Mango    Zara
Yellow    Shirt      50      Zara     Zara   
Blue      Shirt      50      Prada    nan
Green     Jeans      50      Zara     Zara
Green     Jeans      50      nan      nan

使用所需的输出进行更新:

Colour    Clothes    Week    diff     diff1    output
Green     Jeans      50      Mango    Zara     Mango --> Zara
Yellow    Shirt      50      Zara     Zara     No difference
Blue      Shirt      50      Prada    nan      Prada --> nan    
Green     Jeans      50      Zara     Zara     No difference
Green     Jeans      50      nan      nan      nan --> nan

0 个答案:

没有答案