我有两个不同时间段的数据帧-df_period_a
,
Vendor Market
VendorA MarketA
VendorA MarketB
VendorX MarketB
VendorZ MarketB
VendorC MarketX
VendorB MarketX
VendorB MarketA
VendorD MarketA
和df_period_b
为-
Vendor Market
VendorA MarketB
VendorX MarketB
VendorZ MarketB
VendorC MarketB
VendorB MarketX
VendorD MarketX
VendorE MarketB
VendorF MarketC
,这意味着MarketA
已关闭,一个新市场MarketC
以及几个新的供应商E
和F
出现了。我想用df_diff
--
Source Destination Value
MarketX1 MarketX2 1
MarketA1 MarketX2 1
MarketB1 MarketX2 0
MarketX1 MarketB2 1
MarketB1 MarketB2 3
- MarketC2 1
- MarketB2 1
此处的Value
等于从source
时期的a
市场转移到destination
时期的b
市场的供应商数量。
我尝试过的某些方法无法正常工作-
def get_vendor_displacement_count(market_list, df_before, df_after):
for market in market_list:
df_moved_vendors = pd.merge(df_before, df_after, on=['Vendor'], how='inner')
df_moved_vendors.rename(columns={'Market_x':'Source', 'Market_y':'Target'}, inplace=True)
df_moved_vendors['Source'] = dict_periods[len(market_list)+1] + " " + df_moved_vendors['Source'].astype(str)
df_moved_vendors['Target'] = dict_periods[len(market_list)] + " " + df_moved_vendors['Target'].astype(str)
return df_moved_vendors
此外,Sankey图(ipysankeywidget)
是显示此位移的最合适的图,还是我也可以为此查看其他一些可视化效果?谢谢!
答案 0 :(得分:1)
您可以执行以下操作:
dfa1 = df_period_a.assign(Value=1).set_index(['Vendor','Market'])
dfb1 = df_period_b.assign(Value=1).set_index(['Vendor','Market'])
diff = dfa1.join(dfb1, how='outer', lsuffix='a', rsuffix='b').fillna(0).astype(int)
res = (diff.Valueb - diff.Valuea).rename('Change').reset_index().query('Change != 0')
结果:
Vendor Market Change
0 VendorA MarketA -1
2 VendorB MarketA -1
4 VendorC MarketB 1
5 VendorC MarketX -1
6 VendorD MarketA -1
7 VendorD MarketX 1
8 VendorE MarketB 1
9 VendorF MarketC 1
-1
表示卖方离开该市场,1
代表他进入了市场。根据您所关注的内容,您可以按三列中的任意一列进一步对结果进行排序。
import pandas as pd
import matplotlib
import seaborn as sns
df_period_a = pd.DataFrame( {'Vendor': map('Vendor{}'.format, list('AAXZCBBD')),'Market': map('Market{}'.format, list('ABBBXXAA'))})
df_period_b = pd.DataFrame( {'Vendor': map('Vendor{}'.format, list('AXZCBDEF')),'Market': map('Market{}'.format, list('BBBBXXBC'))})
dfa1 = df_period_a.assign(Value=1).set_index(['Vendor','Market'])
dfb1 = df_period_b.assign(Value=1).set_index(['Vendor','Market'])
diff = dfa1.join(dfb1, how='outer', lsuffix='a', rsuffix='b').fillna(0).astype(int)
res = (diff.Valueb - diff.Valuea).rename('Change').reset_index()
cmap = matplotlib.colors.ListedColormap(['red','yellow','green'])
ax = sns.heatmap(res.pivot_table('Change', 'Vendor', 'Market'), cmap=cmap)
cb = ax.collections[0].colorbar
cb.set_ticks([-.67, 0, .67])
cb.set_ticklabels(['left', 'stayed', 'entered'])
sns.despine(left=False, bottom=False, top=False, right=False)
matplotlib.pyplot.show()