我输入了一个成本值,并想计算每个值的概率,这些值根据经过的时间而变化。较低的成本起初应具有较高的概率,但应逐渐减少,并随着时间的推移使较高的成本具有更大的概率份额。
例如,时间= 0
costs = [0, 2, 3, 5]
time = 0
expected_result = [0.99, 0.005, 0.003, 0.002] #should sum to 1.0
在下一次迭代中,时间= 1
costs = [0, 2, 3, 5]
time = 1
expected_result = [0.95, 0.025, 0.015, 0.01]
以后任意延长时间...
costs = [0, 2, 3, 5]
time = 10
expected_result = [0.5, 0.25, 0.15, 0.1]
我找不到类似的问题要学习,但这可能是因为我不确定要达到的目标正确的搜索词。
编辑:
为了提供一些背景信息,我正在设计一个游戏中的代理,该代理通过在每个任意长时间的步骤中选择一个动作来做出决定。动作会产生游戏中资源或预期结果的形式的成本,我想让代理通过选择更安全,更便宜的动作来开始游戏,但是随着游戏的发展,行为方式会更加冒险。总体思路是,我要整理一种紧迫感,并在马尔可夫决策过程中模拟这种行为,该过程考虑了计算过渡概率时的成本和时间。成本已映射到操作,并已事先计算。下面是近似预期结果的代码。
def calculate_probability(costs, time):
probabilities = []
adjustment = 0.001 # To avoid division by zero
rate = 15
for c in costs:
probabilities.append(1 / ((c + adjustment) + ((time + adjustment) / rate)))
return [p / sum(probabilities) for p in probabilities] # scale values
costs = [0, 2, 3, 5]
time = 0
probabilities = calculate_probability(costs, time)
print("costs: {} time: {}".format(costs, time))
print("probabilities: {}".format(probabilities))
time = 1
probabilities = calculate_probability(costs, time)
print("costs: {} time: {}".format(costs, time))
print("probabilities: {}".format(probabilities))
time = 10
probabilities = calculate_probability(costs, time)
print("costs: {} time: {}".format(costs, time))
print("probabilities: {}".format(probabilities))
结果:
costs: [0, 2, 3, 5] time: 0
probabilities: [0.9988994464993101, 0.0005324623915241525, 0.0003550380119066324, 0.00021305309725910417]
costs: [0, 2, 3, 5] time: 1
probabilities: [0.9361523759693375, 0.03066581164511371, 0.020669567411005885, 0.012512244974542817]
costs: [0, 2, 3, 5] time: 10
probabilities: [0.6450909018718866, 0.16146617535857696, 0.11744275252924269, 0.07600017024029376]