python统计分析

时间:2010-10-12 18:48:09

标签: python statistics

鉴于15名球员 - 2名守门员,5名后卫,5名中场球员和3名前锋,以及每名球员都有价值和得分的事实,我想计算得分最高的球队。每个团队必须由1 GK组成,然后是一个阵型,例如4:4:2,4:3:3等我开始使用这样的样本数据

玩家角色点费用

然后我做了以下评估所有组合

将每行读入一个列表(对于每个角色),然后在嵌套运行中使用itertools来获取所有组合

if line[1] == "G": G.append(line[0])
if line[1] == "D": D.append(line[0])
if line[1] == "M": M.append(line[0])
if line[1] == "S": S.append(line[0])

for gk in itertools.combinations(G,1):
    for de in itertools.combinations(D,4):
        for mi in itertools.combinations(M,4):
            for st in itertools.combinations(S,2):
                teams[str(count)]= " ".join(gk)+" "+" ".join(de)+" "+" ".join(mi)+" "+" ".join(st)
                count +=1

有了团队,我会计算他们的积分值和团队成本。如果它低于阈值,我打印出来。
但如果我现在让这20名守门员,150名后卫,150名中场球员和100名前锋,我可以理解为失去记忆。
我该怎么做才能进行这种分析?它是一个生成器而不是我需要的递归函数吗?

非常感谢

2 个答案:

答案 0 :(得分:5)

您可以通过递归解决此问题。以下显示了基本概要,但忽略了团队由特定数量的特定类型的玩家组成的细节。

players=[{'name':'A','score':5,'cost':10},
         {'name':'B','score':10,'cost':3},
         {'name':'C','score':6,'cost':8}]

def player_cost(player):
    return player['cost']
def player_score(player):
    return player['score']
def total_score(players):
    return sum(player['score'] for player in players)

def finance_team_recurse(budget, available_players):
    affordable_players=[]
    for player in available_players:
        if player_cost(player)<=budget:
            # Since we've ordered available players, the first player appended
            # will be the one with the highest score.
            affordable_players.append(player)
    result=[]
    if affordable_players:
        candidate_player=affordable_players[0]
        other_players=affordable_players[1:]
        # if you include candidate_player on your team
        team_with_candidate=finance_team_recurse(budget-player_cost(candidate_player),
                                                 other_players)
        team_with_candidate.append(candidate_player)
        score_of_team_with_candidate=total_score(team_with_candidate)
        if score_of_team_with_candidate>total_score(other_players):
            result=team_with_candidate
        else:
            # if you exclude candidate_player from your team
            team_without_candidate=finance_team_recurse(budget, other_players)
            score_of_team_without_candidate=total_score(team_without_candidate)
            if score_of_team_with_candidate>score_of_team_without_candidate:
                result=team_with_candidate
            else:
                result=team_without_candidate
    return result

def finance_team(budget, available_players):
    tmp=available_players[:]
    # Sort so player with highest score is first. (Greedy algorithm?)
    tmp.sort(key=player_score, reverse=True)
    return finance_team_recurse(budget,tmp)

print(finance_team(20,players))
# [{'score': 6, 'cost': 8, 'name': 'C'}, {'score': 10, 'cost': 3, 'name': 'B'}]

20 choose 1 = 20 combinations
150 choose 4 = 20260275 combinations
100 choose 2 = 4950 combinations

所以总共有20 * 20260275 * 20260275 * 4950 = 40637395564486875000L teams字典中的项目。这需要很多记忆。

for gk in itertools.combinations(G,1):
    for de in itertools.combinations(D,4):
        for mi in itertools.combinations(M,4):
            for st in itertools.combinations(S,2):    
                #Don't collect the results into a dict.
                #That's what's killing you (memory-wise).
                #Just compute the cost and
                #Just print the result here.

PS。 40637395564486875000L大约为10**19。假设您的程序每秒可以处理10**6个组合,程序完成将需要大约130万年......

答案 1 :(得分:0)

函数和生成器有很多帮助:

def make_teams(G, D, M, S):
    """ returns all possible teams """
    for gk in itertools.combinations(G,1):
        for de in itertools.combinations(D,4):
            for mi in itertools.combinations(M,4):
                for st in itertools.combinations(S,2):
                    yield gk, de, mi, st

def get_cost( team ):
    return sum( member.cost for member in team )

def good_teams( min_score=0):
    for team in make_teams(G, D, M, S):
        if get_cost( team ) > min_score:
            yield team

for team in good_teams(min_score=100):
    print team

它仍会生成所有可能的组合,因此您现在可能会用完时间而不是内存。

你正在做的事情似乎是knapsack problem的变体 - 你可以比尝试所有可能的组合更好,但更好

快速获得良好解决方案的一种方法是按玩家的分数对玩家进行排序。你应该首先获得最高得分队伍,但无法保证你能得到最好的解决方案。维基百科称之为“贪婪近似算法”。

def score_per_cost( player ):
    return player.score / player.cost

def sorted_combinations(seq, n):
    return itertools.combinations(
        sorted(seq, key=score_per_cost, reverse=True),n)

def make_teams(G, D, M, S):
    """ returns all possible teams """
    for gk in sorted_combinations(G,1):
        for de in sorted_combinations(D,4):
            for mi in sorted_combinations(M,4):
                for st in sorted_combinations(S,2):
                    yield gk, de, mi, st

def get_cost( team ):
    return sum( member.cost for member in team )

def top_teams(n):
    return itertools.islice(make_teams(G, D, M, S),n)

for team in top_teams(100):
    print team

我将向读者添加“每队费用&lt;阈值”要求(提示:它是make_teams中的一行:p)。