连接后如何突出显示熊猫数据框中的差异?

时间:2019-02-26 05:59:45

标签: python python-3.x pandas dataframe

我有两个数据帧,如下所示:

XYZ
Year Quantity Car     Colour
2001 1000     Swift   Red
2001 16       Wagonar White
2001 16       Wagonar Black
2001 200      Baleno  Silver
2001 20       Zen     White

ABC  
Year Quantity Car     Colour
2001 1000     Swift   Red
2001 16       Wagonar White
2001 200      Baleno  Silver
2001 44       Alto    Blue

,输出应如下所示:

Year      Quantity Car             Colour
XYZ  ABC  XYZ  ABC XYZ     ABC     XYZ    ABC
2001 2001 1000 100 Swift   Swift   Red    Red
2001 2001 16   16  Wagonar Wagonar White  White
2001 2001 16       Wagonar         Black 
2001 2001 200  200 Baleno  Baleno  Silver Silver
2001 2001 20       Zen             White
2001 2001      44          Alto           Blue

我已经尝试过了

df_all = pd.concat([df_temp, df_temp1], axis='columns', keys=['XYZ', 'ABC'])
print(df_all)
df_final = df_all.swaplevel(axis='columns')[df_temp.columns]
print(df_final)
def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other = data.xs('First', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other, level=0), attr,''),index=data.index, columns=data.columns)

 df_final.style.apply(highlight_diff, axis=None)
 print(df_final)

数据框之间的差异应突出显示。

例如,在本例中,汽车:Wagonar Zen和Alto必须突出显示,因为它们在两个数据帧中是不同的

我尝试了这种将它们串联的方式:

    YEAR Quantity  CAR    COLOR  car     color
0   2001    16    Wagonar white  Wagonar white
1   2001    16    Wagonar black  Wagonar white
2   2001    20    Zen     white  NaN     NaN
3   2001    44    NaN     NaN    Alto    blue
4   2001   200    Baleno  silver Baleno  silver
5   2001  1000    Swift   red    Swift   red

所有CAPS标题均属于xyz公司,小标题属于abc 如何将“ CAR”列与“ car”列以及“ COLOR”列与“ color”列进行比较,并突出显示值不匹配的整个行。

我尝试过:

def highlight_rows(s):        
if not (s['CAR'] == s['car'] and s['COLOR'] == s['color']):
    return 'background-color: green'

df_final.style.apply(highlight_rows, axis = None)

但这不起作用

1 个答案:

答案 0 :(得分:0)

YearQuantity对的重复存在问题,因此可能的解决方案是在计数器MultiIndex之前创建唯一的concat

df_temp.index = df_temp.groupby(['Year','Quantity']).cumcount()
df_temp1.index = df_temp1.groupby(['Year','Quantity']).cumcount()

df_all = (pd.concat([df_temp.set_index(['Year','Quantity'], append=True), 
                     df_temp1.set_index(['Year','Quantity'], append=True)], 
                     axis='columns', 
                     keys=['XYZ', 'ABC']))
print(df_all)
                     XYZ              ABC        
                     Car  Colour      Car  Colour
  Year Quantity                                  
0 2001 16        Wagonar   White  Wagonar   White
       20            Zen   White      NaN     NaN
       44            NaN     NaN     Alto    Blue
       200        Baleno  Silver   Baleno  Silver
       1000        Swift     Red    Swift     Red
1 2001 16        Wagonar   Black      NaN     NaN

然后将index的{​​{1}}分别转换为DataFrameconcat

MultiIndex

最后添加新掩码并按位或-df = df_all.index.to_frame().drop(0, axis=1) df1 = pd.concat([df, df], axis=1, keys=('XYZ','ABC')) print (df1) XYZ ABC Year Quantity Year Quantity Year Quantity 0 2001 16 2001 16 2001 16 20 2001 20 2001 20 44 2001 44 2001 44 200 2001 200 2001 200 1000 2001 1000 2001 1000 1 2001 16 2001 16 2001 16 df_final = df_all.join(df1).reset_index(drop=True).swaplevel(axis='columns')[df_temp.columns] print(df_final) Year Quantity Car Colour XYZ ABC XYZ ABC XYZ ABC XYZ ABC 0 2001 2001 16 16 Wagonar Wagonar White White 1 2001 2001 20 20 Zen NaN White NaN 2 2001 2001 44 44 NaN Alto NaN Blue 3 2001 2001 200 200 Baleno Baleno Silver Silver 4 2001 2001 1000 1000 Swift Swift Red Red 5 2001 2001 16 16 Wagonar NaN Black NaN 组合:

|

def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other1 = data.xs('XYZ', axis='columns', level=-1)
    other2 = data.xs('ABC', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other1, level=0) | 
                                 data.ne(other2, level=0), attr,''),
                        index=data.index, columns=data.columns)

pic