我们假设我们有一个类似下面的数据框。
Games Players Score
0 Foo 100
Bar 10
Baz 5
1 Blah 30
Bar 10
Foo 2
2 Foo 40
Fes 5
...
我希望能够处理它来构建一个新的数据帧(矩阵),我们有:
pairwise_comparisons.loc[A, B] = W / T
带
W = # of games where A ended up with higher score than B
T = # of games in which they both participated
我该如何解决这个问题?
例如,仅使用上面显示的数据,我们将按如下方式填充矩阵:
pairwise_comparisons.loc['Foo', 'Bar'] = 1/2
因为Foo
和Bar
在游戏0
和1
(2场比赛)和Foo
中赢了1场比赛(游戏{{1} }),所以W / T = 1/2。
我当然可以手动循环每对玩家并在每场比赛中比较他们的得分,但这可能会很慢。关于如何矢量化解决方案的任何想法?
以上的变体是当我们尝试计算0
时我们可以存储他们都参与的游戏中A和B之间得分的中位数差异。
答案 0 :(得分:2)
设置
s = pd.Series({
(0, 'Bar'): 10,
(0, 'Baz'): 5,
(0, 'Foo'): 100,
(1, 'Bar'): 10,
(1, 'Blah'): 30,
(1, 'Foo'): 2,
(2, 'Fes'): 5,
(2, 'Foo'): 40
})
df = s.unstack()
v = df.values
m, n = v.shape
nrng = np.arange(n)
# who played who
played = (~np.isnan(v))
played_3d = played.reshape(m, 1, n) & played.reshape(m, n, 1)
played_3d[:, nrng, nrng] = False
# who beat who
scores = np.where(played, v, -1)
winners = np.where(
played_3d,
scores.reshape(m, 1, n) > scores.reshape(m, n, 1),
0
)
# how many times have we played eachother
games_played = (played_3d).sum(0)
games_won = winners.sum(0)
pairwise = np.empty((n, n), dtype=np.float)
pairwise.fill(np.nan)
r, c = np.where(games_played != 0)
pairwise[r, c] = games_won[r, c] / games_played[r, c]
pairwise_comparisons = pd.DataFrame(pairwise, df.columns, df.columns).stack()
pairwise_comparisons.loc['Foo', 'Bar']
0.5