如何将列分配给数据帧作为每行的权重,然后根据这些权重对数据帧进行采样?

时间:2017-04-30 04:46:14

标签: pandas dataframe python-3.5

我正在尝试在数据帧中实现加权随机选择。我使用下面的代码来构建数据框:

import pandas as pd
from numpy import exp
import random

moves = [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4)]


data = {'moves': list(map(lambda i: moves[i] if divmod(i, len(moves))[0] != 1 else moves[divmod(i, len(moves))[1]],
                       [i for i in range(2 * len(moves))])),
    'player': list(map(lambda i: 1 if i >= len(moves) else 2,
                       [i for i in range(2 * len(moves))])),
    'wins': [random.randint(0, 2) for i in range(2 * len(moves))],
    'playout_number': [random.randint(0,1) for i in range(2 * len(moves))]
    }
frame = pd.DataFrame(data)

然后我创建了一个列表并将其作为新列'weight'插入:

total = sum(map(lambda a, b: exp(a/b) if b != 0 else 0, frame['wins'], frame['playout_number']))
weights = list(map(lambda a, b: exp(a/b) / total if b != 0 else 0, frame['wins'], frame['playout_number']))
frame = frame.assign(weight=weights)

现在我想根据插入的新列中每行的权重选择一个随机行。
问题是我想使用pandas.DataFrame.sample(weights=weight),但我不知道如何。我可以用numpy.random.choice(weights=weights)做到这一点,但我更喜欢继续使用pandas库函数 我提前感谢你的帮助。

1 个答案:

答案 0 :(得分:4)

您可以在sample中将参数nfracweights一起使用。

参数weights可以是array,因此可以使用list

df = frame.sample(n=1, weights=weights)

dfSeries)列:

#select 1 row - n=1
df = frame.sample(n=1, weights=frame.weight)
print (df)
    moves  player  playout_number  wins    weight
6  (1, 2)       1               1     2  0.258325
#select 20% rows - frac=0.2 
df = frame.sample(frac=0.2, weights=frame.weight)
print (df)
    moves  player  playout_number  wins    weight
5  (2, 4)       2               1     2  0.221747
4  (2, 3)       2               1     1  0.081576