根据多个条件为大型列表找到所有组合

时间:2019-07-17 14:36:39

标签: python mathematical-optimization itertools linear-programming

我正在尝试为幻想自行车比赛计算最佳球队。我有一个csv文件,其中包含176个自行车手,他们的球队,所获得的积分以及将其放入我的球队的价格。我正试图找到得分最高的16个自行车队。

适用于任何团队组成的规则是:

  • 团队的总费用不能超过100。
  • 同一支队伍中的幻想自行车参赛者不得超过4个。

下面是我的csv文件的简短摘录。

THOMAS Geraint,Team INEOS,142,13
SAGAN Peter,BORA - hansgrohe,522,11.5
GROENEWEGEN Dylan,Team Jumbo-Visma,205,11
FUGLSANG Jakob,Astana Pro Team,46,10
BERNAL Egan,Team INEOS,110,10
BARDET Romain,AG2R La Mondiale,21,9.5
QUINTANA Nairo,Movistar Team,58,9.5
YATES Adam,Mitchelton-Scott,40,9.5
VIVIANI Elia,Deceuninck - Quick Step,273,9.5
YATES Simon,Mitchelton-Scott,13,9
EWAN Caleb,Lotto Soudal,13,9

解决此问题的最简单方法是生成团队所有可能组合的列表,然后应用规则将不符合上述规则的团队排除在外。此后,很容易计算每个团队的总得分并选择得分最高的团队。从理论上讲,可以通过下面的代码生成可用团队的列表。

database_csv = pd.read_csv('renner_db_example.csv')
renners = database_csv.to_dict(orient='records')

budget = 100
max_same_team = 4
team_total = 16

combos = itertools.combinations(renners,team_total)
usable_combos = []

for i in combos:
    if sum(persoon["Waarde"] for persoon in i)  < budget and all(z <= max_same_team for z in [len(list(group)) for key, group in groupby([persoon["Ploeg"] for persoon in i])]) == True:   
    usable_combos.append(i)    

但是,即使too many calculations问题的答案暗示了其他内容,将176个自行车手的列表中的所有组合计算为16人的团队,也是this对我的计算机来说可以处理的事情。我已经进行了一些研究,没有找到方法来应用上述条件,而不必遍历每个选项。

是否可以通过使上述方法起作用或使用替代方法来找到最佳团队?

编辑: 在文本中,可以找到完整的CSV文件here

2 个答案:

答案 0 :(得分:9)

这是一个经典的operations research问题。

有成千上万种算法可以找到最佳解决方案(或取决于算法而定):

  • 混合整数编程
  • 元启发法
  • 约束编程
  • ...

以下代码将使用MIP,ortools库和默认求解器COIN-OR找到最佳解决方案:

from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)    
cyclist_df = pd.read_csv('cyclists.csv')

# Variables

variables_name = {}
variables_team = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

# Constraints

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])    

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    objective.SetCoefficient(variables_name[row['Naam']], row['Punten totaal:'])

# Solve and retrieve solution     
solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print(cyclist_df[cyclist_df.Naam.isin(chosen_cyclists)])

打印:

    Naam                Ploeg                       Punten totaal:  Waarde
1   SAGAN Peter         BORA - hansgrohe            522             11.5
2   GROENEWEGEN         Dylan   Team Jumbo-Visma    205             11.0
8   VIVIANI Elia        Deceuninck - Quick Step     273             9.5
11  ALAPHILIPPE Julian  Deceuninck - Quick Step     399             9.0
14  PINOT Thibaut       Groupama - FDJ              155             8.5
15  MATTHEWS Michael    Team Sunweb                 323             8.5
22  TRENTIN Matteo      Mitchelton-Scott            218             7.5
24  COLBRELLI Sonny     Bahrain Merida              238             6.5
25  VAN AVERMAET Greg   CCC Team                    192             6.5
44  STUYVEN Jasper      Trek - Segafredo            201             4.5
51  CICCONE Giulio      Trek - Segafredo            153             4.0
82  TEUNISSEN Mike      Team Jumbo-Visma            255             3.0
83  HERRADA Jesús       Cofidis, Solutions Crédits  255             3.0
104 NIZZOLO Giacomo     Dimension Data              121             2.5
123 MEURISSE Xandro     Wanty - Groupe Gobert       141             2.0
151 TRATNIK Jan Bahrain Merida                      87              1.0

此代码如何解决问题?正如@KyleParsons所说,它看起来像背包问题,可以使用Integer Programming进行建模。

让我们定义变量Xi (0 <= i <= nb_cyclists)Yj (0 <= j <= nb_teams)

Xi = 1 if cyclist n°i is chosen, =0 otherwise

Yj = n where n is the number of cyclists chosen within team j

要定义这些变量之间的关系,可以对以下约束进行建模:

# Link cyclist <-> team
For all j, Yj >= sum(Xi, for all i where Xi is part of team j)

要最多只为每个团队选择4个自行车手,请创建以下约束条件:

# Max 4 cyclist per team
For all j, Yj <= 4

要选择16个自行车骑手,以下是相关的约束:

# Min 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) >= 16
# Max 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) <= 16

成本约束:

# Max cost 
sum(ci * Xi, 1<=i<=n_cyclists) <= 100 
# where ci = cost of cyclist i

然后您可以最大化

# Objective
max sum(pi * Xi, 1<=i<=n_cyclists)
# where pi = nb_points of cyclist i

请注意,我们使用线性目标和线性不等式约束对问题进行建模。如果Xi和Yj是连续变量,则此问题将是polynomial(线性编程),可以使用以下方法解决:

因为这些变量是整数(整数编程或混合整数编程),所以该问题被称为NP_complete类的一部分(除非您是genious,否则无法使用多项式解决方案来解决)。像COIN-OR这样的求解器使用复杂的Branch & BoundBranch & Cut方法来有效地求解。 ortools提供了一个很好的包装器,可以将COIN与python一起使用。这些工具是免费和开源的。

所有这些方法的优点是无需迭代所有可能的解决方案即可找到最佳解决方案(并大大减少了组合运算法则)。

答案 1 :(得分:1)

我为您的问题添加了另一个答案:

  

我发布的CSV实际上已被修改,我的原始CSV还包含每个骑手的列表以及每个阶段的得分。此列表看起来像这样的[0, 40, 13, 0, 2, 55, 1, 17, 0, 14]。我试图找到表现最佳的团队。因此,我有16名自行车手,其中10名自行车手的得分将计入每天的得分。然后将每天的分数相加以获得总分数。目的是使最终总分尽可能高。

如果您认为我应该编辑我的第一篇文章,请告诉我,我认为这样更清楚,因为我的第一篇文章非常密集,可以回答最初的问题。

我们引入一个新变量:

Zik = 1 if cyclist i is selected and is one of the top 10 in your team on day k

您需要将这些约束添加到链接变量Zik和Xi(如果未选择骑单车的人i,即Xi = 0则变量Zik不能= 1)

For all i, sum(Zik, 1<=k<=n_days) <= n_days * Xi

这些限制每天要选择10个自行车骑手:

For all k, sum(Zik, 1<=i<=n_cyclists) <= 10

最后,您的目标可以这样写:

Maximize sum(pik * Xi * Zik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k

这是思考部分。这样写的物镜不是线性的(注意两个变量X和Z之间的乘法)。幸运的是,它们都有二进制文件,并且有一个技巧可以将该公式转换为线性形式。

enter image description here

让我们再次引入新变量Lik(Lik = Xi * Zik)以使目标线性化。

目标现在可以这样写并且是线性的:

Maximize sum(pik * Lik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k

我们现在需要添加这些约束,以使Lik等于Xi * Zik

For all i,k : Xi + Zik - 1 <= Lik
For all i,k : Lik <= 1/2 * (Xi + Zik)

瞧瞧。这是数学的魅力,您可以使用线性方程式对很多事物进行建模。我介绍了高级概念,如果您乍一看不明白,这是正常的。


我在此file上模拟了“每日得分”列。

这是解决新问题的Python代码:

import ast
from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
cyclist_df = pd.read_csv('cyclists_day.csv')
cyclist_df['Punten_day'] = cyclist_df['Punten_day'].apply(ast.literal_eval)

# Variables
variables_name = {}
variables_team = {}
variables_name_per_day = {}
variables_linear = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

    for k in range(10):
        variables_name_per_day[(row['Naam'], k)] = solver.IntVar(0, 1, 'z_{}_{}'.format(row['Naam'], k))
        variables_linear[(row['Naam'], k)] = solver.IntVar(0, 1, 'l_{}_{}'.format(row['Naam'], k))

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])

# Link Zik and Xi
for name, cyclist in variables_name.items():
    constraint_link_cyclist_day = solver.Constraint(-solver.infinity(), 0)
    constraint_link_cyclist_day.SetCoefficient(cyclist, - 10)
    for k in range(10):
        constraint_link_cyclist_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Min/Max 10 cyclists per day
for k in range(10):
    constraint_cyclist_per_day = solver.Constraint(10, 10)
    for name in cyclist_df.Naam:
        constraint_cyclist_per_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Linearization constraints 
for name, cyclist in variables_name.items():
    for k in range(10):
        constraint_linearization1 = solver.Constraint(-solver.infinity(), 1)
        constraint_linearization2 = solver.Constraint(-solver.infinity(), 0)

        constraint_linearization1.SetCoefficient(cyclist, 1)
        constraint_linearization1.SetCoefficient(variables_name_per_day[name, k], 1)
        constraint_linearization1.SetCoefficient(variables_linear[name, k], -1)

        constraint_linearization2.SetCoefficient(cyclist, -1/2)
        constraint_linearization2.SetCoefficient(variables_name_per_day[name, k], -1/2)
        constraint_linearization2.SetCoefficient(variables_linear[name, k], 1)

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    for k in range(10):
        objective.SetCoefficient(variables_linear[row['Naam'], k], row['Punten_day'][k])

solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print('\n'.join(chosen_cyclists))

for k in range(10):
    print('\nDay {} :'.format(k + 1))
    chosen_cyclists_day = [name for (name, day), variable in variables_name_per_day.items() 
                       if (day == k and variable.solution_value() > 0.5)]
    assert len(chosen_cyclists_day) == 10
    assert all(chosen_cyclists_day[i] in chosen_cyclists for i in range(10))
    print('\n'.join(chosen_cyclists_day))

结果如下:

您的团队:

SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
BENOOT Tiesj
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro
GRELLIER Fabien

每天精选骑自行车的人

Day 1 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 2 :
SAGAN Peter
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
NIZZOLO Giacomo
MEURISSE Xandro

Day 3 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
MATTHEWS Michael
TRENTIN Matteo
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 4 :
SAGAN Peter
VIVIANI Elia
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús

Day 5 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
CICCONE Giulio
HERRADA Jesús

Day 6 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike

Day 7 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro

Day 8 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo
MEURISSE Xandro

Day 9 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
TEUNISSEN Mike
HERRADA Jesús

Day 10 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
PINOT Thibaut
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo

让我们比较答案1和答案2 print(solver.Objective().Value())的结果:

第一个模型得到3738.0,第二个模型得到3129.087388325567。该值较低,因为您每个阶段只选择10个自行车手,而不是16个。

现在,如果保留第一个解决方案并使用新的评分方法,我们将获得3122.9477585307413

我们可以认为第一个模型足够好:我们不必引入新的变量/约束,模型保持简单,并且得到的解决方案几乎与复杂模型一样好。有时不一定要100%准确,而且可以通过一些近似值更轻松快速地求解模型。