显示和可视化两个数据框之间的差异

时间:2019-12-07 00:03:23

标签: python pandas dataframe data-visualization

我有两个不同时间段的数据帧-df_period_a

Vendor      Market
VendorA     MarketA
VendorA     MarketB
VendorX     MarketB
VendorZ     MarketB
VendorC     MarketX
VendorB     MarketX
VendorB     MarketA
VendorD     MarketA

df_period_b为-

Vendor      Market
VendorA     MarketB
VendorX     MarketB
VendorZ     MarketB
VendorC     MarketB
VendorB     MarketX
VendorD     MarketX
VendorE     MarketB
VendorF     MarketC

,这意味着MarketA已关闭,一个新市场MarketC以及几个新的供应商EF出现了。我想用df_diff--

来显示这一点以及供应商在市场中的移动(如果有)。
Source        Destination    Value
MarketX1        MarketX2        1
MarketA1        MarketX2        1
MarketB1        MarketX2        0
MarketX1        MarketB2        1
MarketB1        MarketB2        3
  -             MarketC2        1
  -             MarketB2        1

此处的Value等于从source时期的a市场转移到destination时期的b市场的供应商数量。

我尝试过的某些方法无法正常工作-

def get_vendor_displacement_count(market_list, df_before, df_after):
    for market in market_list:
        df_moved_vendors = pd.merge(df_before, df_after, on=['Vendor'], how='inner')
        df_moved_vendors.rename(columns={'Market_x':'Source', 'Market_y':'Target'}, inplace=True)
        df_moved_vendors['Source'] = dict_periods[len(market_list)+1]  + " " +  df_moved_vendors['Source'].astype(str)
        df_moved_vendors['Target'] = dict_periods[len(market_list)] + " " + df_moved_vendors['Target'].astype(str)
    return df_moved_vendors

此外,Sankey图(ipysankeywidget)是显示此位移的最合适的图,还是我也可以为此查看其他一些可视化效果?谢谢!

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

dfa1 = df_period_a.assign(Value=1).set_index(['Vendor','Market'])
dfb1 = df_period_b.assign(Value=1).set_index(['Vendor','Market'])
diff = dfa1.join(dfb1, how='outer', lsuffix='a', rsuffix='b').fillna(0).astype(int)
res = (diff.Valueb - diff.Valuea).rename('Change').reset_index().query('Change != 0')

结果:

    Vendor   Market  Change
0  VendorA  MarketA      -1
2  VendorB  MarketA      -1
4  VendorC  MarketB       1
5  VendorC  MarketX      -1
6  VendorD  MarketA      -1
7  VendorD  MarketX       1
8  VendorE  MarketB       1
9  VendorF  MarketC       1

-1表示卖方离开该市场,1代表他进入了市场。根据您所关注的内容,您可以按三列中的任意一列进一步对结果进行排序。


更新:简单可视化为热图(绿色=供应商进入市场;黄色=无变化,供应商留在市场;红色=供应商离开市场;白色(背景)=无数据(供应商在该市场中不活跃,无论是在时期a还是在时期b)):

import pandas as pd
import matplotlib
import seaborn as sns

df_period_a = pd.DataFrame( {'Vendor': map('Vendor{}'.format, list('AAXZCBBD')),'Market': map('Market{}'.format, list('ABBBXXAA'))})
df_period_b = pd.DataFrame( {'Vendor': map('Vendor{}'.format, list('AXZCBDEF')),'Market': map('Market{}'.format, list('BBBBXXBC'))})

dfa1 = df_period_a.assign(Value=1).set_index(['Vendor','Market'])
dfb1 = df_period_b.assign(Value=1).set_index(['Vendor','Market'])
diff = dfa1.join(dfb1, how='outer', lsuffix='a', rsuffix='b').fillna(0).astype(int)
res = (diff.Valueb - diff.Valuea).rename('Change').reset_index()

cmap = matplotlib.colors.ListedColormap(['red','yellow','green'])
ax = sns.heatmap(res.pivot_table('Change', 'Vendor', 'Market'), cmap=cmap)
cb = ax.collections[0].colorbar
cb.set_ticks([-.67, 0, .67])
cb.set_ticklabels(['left', 'stayed', 'entered'])
sns.despine(left=False, bottom=False, top=False, right=False)
matplotlib.pyplot.show()

enter image description here