以下是按预期工作的代码。
自: Outputting difference in two Pandas dataframes side by side - highlighting the difference
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
DF1 = StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.11 False "Graduated"
113 Zoe NaN True " "
""")
DF2 = StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.21 False "Graduated"
113 Zoe NaN False "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
df_all = pd.concat([df1, df2],
axis='columns', keys=['First', 'Second'])
df_final = df_all.swaplevel(axis='columns')[df1.columns[1:]]
def highlight_diff(data, color='yellow'):
attr = 'background-color: {}'.format(color)
other = data.xs('First', axis='columns', level=-1)
return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
index=data.index, columns=data.columns)
df_final.style.apply(highlight_diff, axis=None)
唯一的问题是我不想要第一行(111)因为没有差异。
如何在不使用highlight_diff功能的情况下仅选择已更改的行? 我希望并排突出显示行112和113而不突出显示,如Ted的答案所示。
答案 0 :(得分:1)
df_select = df_final.copy()
df_select.columns = df_final.columns.swaplevel()
duplicate = (df_select['First'] == df_select['Second']).all(axis=1)
df_final = df_final[~duplicate]
说明:
我们创建第二个数据框df_select
以选择相关行(并复制df_final
,以便原始版本不会更改)。它的列被交换,因此First
和Second
处于第0级。然后你要抛出的行是First和Second相同的行。我们将df_final
更改为仅包含非重复行。
编辑:如果您根本不想df_final
使用df_all
而是:
duplicate = (df_all['First'] == df_all['Second']).drop('Comment', axis=1).all(axis=1)
result = df_all[~duplicate]
(我假设您不想检查评论,与以前的程序类似。如果您确实需要,请删除drop
。)