Question

我几乎没有正式的离散数学训练，并且遇到了一些问题。我正在尝试编写一个代理人，该代理人读取人类玩家的（任意）分数并且经常得分。代理人需要经常“落后”和“追赶”，以便人类玩家相信正在进行一些竞争。然后，代理人必须赢得或失去（取决于条件）对抗人类。

我尝试了一些不同的技术，包括一个不稳定的概率循环（失败可怕）。我认为这个问题需要像发射隐马尔可夫模型（HMM）这样的东西，但我不确定如何实现它（或者甚至是否这是最好的方法）。

我有一个gist，但同样，它很糟糕。

我希望__main__函数可以提供有关此代理程序目标的一些信息。它将在pygame中调用。

Answer 1

我认为你可能会过度思考这个问题。您可以使用简单的概率来估计计算机得分应该“赶上”的频率和程度。此外，您可以计算计算机得分与人类得分之间的差异，然后将其提供给类似S形的函数，以便为您提供计算机得分增加的程度。

说明性的Python：

#!/usr/bin/python
import random, math
human_score = 0
computer_score = 0
trials = 100
computer_ahead_factor = 5 # maximum amount of points the computer can be ahead by
computer_catchup_prob = 0.33 # probability of computer catching up
computer_ahead_prob = 0.5 # probability of computer being ahead of human
computer_advantage_count = 0
for i in xrange(trials):
    # Simulate player score increase.
    human_score += random.randint(0,5) # add an arbitrary random amount
    # Simulate computer lagging behind human, by calculating the probability of
    # computer jumping ahead based on proximity to the human's score.
    score_diff = human_score - computer_score
    p = (math.atan(score_diff)/(math.pi/2.) + 1)/2.
    if random.random() < computer_ahead_prob:
        computer_score = human_score + random.randint(0,computer_ahead_factor)
    elif random.random() < computer_catchup_prob:
        computer_score += int(abs(score_diff)*p)
    # Display scores.
    print 'Human score:',human_score
    print 'Computer score:',computer_score
    computer_advantage_count += computer_score > human_score
print 'Effective computer advantage ratio: %.6f' % (computer_advantage_count/float(trials),)

Answer 2

我假设人类无法看到计算机代理人在玩游戏。如果是这种情况，可以尝试一下这个想法。

创建可以为任何给定移动评分的所有可能点组合的列表。对于每次移动，找到您希望代理在当前转弯后结束的分数范围。将可能的移动值集合减少到仅在该特定范围内结束代理并随机选择一个的值。随着条件的变化，您希望代理获得多远或多远，只需适当地滑动您的范围。

如果你正在寻找某种内置的东西并研究人类的心理影响，我无法帮助你。如果您想要比您更具体的情况，您需要为我们定义更多规则。

动态评分之后

2 个答案: