我有两个数据帧,如下所示:
XYZ
Year Quantity Car Colour
2001 1000 Swift Red
2001 16 Wagonar White
2001 16 Wagonar Black
2001 200 Baleno Silver
2001 20 Zen White
ABC
Year Quantity Car Colour
2001 1000 Swift Red
2001 16 Wagonar White
2001 200 Baleno Silver
2001 44 Alto Blue
,输出应如下所示:
Year Quantity Car Colour
XYZ ABC XYZ ABC XYZ ABC XYZ ABC
2001 2001 1000 100 Swift Swift Red Red
2001 2001 16 16 Wagonar Wagonar White White
2001 2001 16 Wagonar Black
2001 2001 200 200 Baleno Baleno Silver Silver
2001 2001 20 Zen White
2001 2001 44 Alto Blue
我已经尝试过了
df_all = pd.concat([df_temp, df_temp1], axis='columns', keys=['XYZ', 'ABC'])
print(df_all)
df_final = df_all.swaplevel(axis='columns')[df_temp.columns]
print(df_final)
def highlight_diff(data, color='yellow'):
attr = 'background-color: {}'.format(color)
other = data.xs('First', axis='columns', level=-1)
return pd.DataFrame(np.where(data.ne(other, level=0), attr,''),index=data.index, columns=data.columns)
df_final.style.apply(highlight_diff, axis=None)
print(df_final)
数据框之间的差异应突出显示。
例如,在本例中,汽车:Wagonar Zen和Alto必须突出显示,因为它们在两个数据帧中是不同的
我尝试了这种将它们串联的方式:
YEAR Quantity CAR COLOR car color
0 2001 16 Wagonar white Wagonar white
1 2001 16 Wagonar black Wagonar white
2 2001 20 Zen white NaN NaN
3 2001 44 NaN NaN Alto blue
4 2001 200 Baleno silver Baleno silver
5 2001 1000 Swift red Swift red
所有CAPS标题均属于xyz公司,小标题属于abc 如何将“ CAR”列与“ car”列以及“ COLOR”列与“ color”列进行比较,并突出显示值不匹配的整个行。
我尝试过:
def highlight_rows(s):
if not (s['CAR'] == s['car'] and s['COLOR'] == s['color']):
return 'background-color: green'
df_final.style.apply(highlight_rows, axis = None)
但这不起作用
答案 0 :(得分:0)
Year
和Quantity
对的重复存在问题,因此可能的解决方案是在计数器MultiIndex
之前创建唯一的concat
:
df_temp.index = df_temp.groupby(['Year','Quantity']).cumcount()
df_temp1.index = df_temp1.groupby(['Year','Quantity']).cumcount()
df_all = (pd.concat([df_temp.set_index(['Year','Quantity'], append=True),
df_temp1.set_index(['Year','Quantity'], append=True)],
axis='columns',
keys=['XYZ', 'ABC']))
print(df_all)
XYZ ABC
Car Colour Car Colour
Year Quantity
0 2001 16 Wagonar White Wagonar White
20 Zen White NaN NaN
44 NaN NaN Alto Blue
200 Baleno Silver Baleno Silver
1000 Swift Red Swift Red
1 2001 16 Wagonar Black NaN NaN
然后将index
的{{1}}分别转换为DataFrame
和concat
:
MultiIndex
最后添加新掩码并按位或-df = df_all.index.to_frame().drop(0, axis=1)
df1 = pd.concat([df, df], axis=1, keys=('XYZ','ABC'))
print (df1)
XYZ ABC
Year Quantity Year Quantity
Year Quantity
0 2001 16 2001 16 2001 16
20 2001 20 2001 20
44 2001 44 2001 44
200 2001 200 2001 200
1000 2001 1000 2001 1000
1 2001 16 2001 16 2001 16
df_final = df_all.join(df1).reset_index(drop=True).swaplevel(axis='columns')[df_temp.columns]
print(df_final)
Year Quantity Car Colour
XYZ ABC XYZ ABC XYZ ABC XYZ ABC
0 2001 2001 16 16 Wagonar Wagonar White White
1 2001 2001 20 20 Zen NaN White NaN
2 2001 2001 44 44 NaN Alto NaN Blue
3 2001 2001 200 200 Baleno Baleno Silver Silver
4 2001 2001 1000 1000 Swift Swift Red Red
5 2001 2001 16 16 Wagonar NaN Black NaN
组合:
|
def highlight_diff(data, color='yellow'):
attr = 'background-color: {}'.format(color)
other1 = data.xs('XYZ', axis='columns', level=-1)
other2 = data.xs('ABC', axis='columns', level=-1)
return pd.DataFrame(np.where(data.ne(other1, level=0) |
data.ne(other2, level=0), attr,''),
index=data.index, columns=data.columns)