我正在尝试在数据帧中实现加权随机选择。我使用下面的代码来构建数据框:
import pandas as pd
from numpy import exp
import random
moves = [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4)]
data = {'moves': list(map(lambda i: moves[i] if divmod(i, len(moves))[0] != 1 else moves[divmod(i, len(moves))[1]],
[i for i in range(2 * len(moves))])),
'player': list(map(lambda i: 1 if i >= len(moves) else 2,
[i for i in range(2 * len(moves))])),
'wins': [random.randint(0, 2) for i in range(2 * len(moves))],
'playout_number': [random.randint(0,1) for i in range(2 * len(moves))]
}
frame = pd.DataFrame(data)
然后我创建了一个列表并将其作为新列'weight'插入:
total = sum(map(lambda a, b: exp(a/b) if b != 0 else 0, frame['wins'], frame['playout_number']))
weights = list(map(lambda a, b: exp(a/b) / total if b != 0 else 0, frame['wins'], frame['playout_number']))
frame = frame.assign(weight=weights)
现在我想根据插入的新列中每行的权重选择一个随机行。
问题是我想使用pandas.DataFrame.sample(weights=weight)
,但我不知道如何。我可以用numpy.random.choice(weights=weights)
做到这一点,但我更喜欢继续使用pandas库函数
我提前感谢你的帮助。
答案 0 :(得分:4)
您可以在sample
中将参数n
或frac
与weights
一起使用。
参数weights
可以是array
,因此可以使用list
:
df = frame.sample(n=1, weights=weights)
df
(Series
)列:
#select 1 row - n=1
df = frame.sample(n=1, weights=frame.weight)
print (df)
moves player playout_number wins weight
6 (1, 2) 1 1 2 0.258325
#select 20% rows - frac=0.2
df = frame.sample(frac=0.2, weights=frame.weight)
print (df)
moves player playout_number wins weight
5 (2, 4) 2 1 2 0.221747
4 (2, 3) 2 1 1 0.081576