Question

我必须优化三个numpy数组的系数，以最大化我的评估功能。我有一个名为train ['target']的目标数组和三个名为array1，array2和array3的预测数组。

我想为这三个数组放置最佳线性系数，即x，y，z，这将使函数最大化

roc_aoc_curve（train ['target']，x array1 + y array2 + z * array3）

当预测值更接近目标时，上述功能将最大。即x array1 + y array2 + z * array3应该更接近于train ['target']。

x，y，z> = 0和x，y，z <= 1

的范围

基本上，我正在尝试为三个要构成函数的数组分别设置权重x，y，z

x array1 + y array2 + z * array3更接近火车['target']

在获得此帮助方面将提供任何帮助。

我使用了pulp.LpProblem（'Giapetto'，pulp.LpMaximize）进行最大化。它适用于普通数，整数等，但是在尝试处理数组时失败。

import numpy as np
import pulp

# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)

# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)

score =  roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)

prob += score

coef = x+y+z

prob += (coef==1)

# solve the LP using the default solver
optimization_result = prob.solve()

# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal

# display the results
for var in (x, y,z):
    print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))

在一行上出错

score =  roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)

TypeError：/：不支持的操作数类型：“ int”和“ LpVariable”

使用数组时无法超出此行。不知道我的方法是否正确。在优化功能方面的任何帮助将不胜感激。

Answer 1

在将数组元素的总和添加到PuLP模型中时，必须使用PuLP之类的内置lpSum结构来做-您不能只添加数组在一起（如您所发现）。

因此，您的score定义应如下所示：

score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])

关于此的一些注意事项：

[+]您没有提供roc_auc_score的定义，所以我只是假装它等于目标数组与其他3个数组的加权和之和。 >

[+]我怀疑您对roc_auc_score的实际计算是非线性的；详情请见下文。

[+] arr_ind是数组索引的列表，我是这样创建的：

# build array index
arr_ind = range(len(array1))

[+]您也没有包含数组，所以我这样创建它们：

array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)

train = {}
train['target'] = np.ones((10, 1))

这是我完整的代码，可以编译和执行，尽管我确定它不会为您提供期望的结果，因为我只是猜到了target和roc_auc_score：< / p>

import numpy as np
import pulp

# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)

# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)

# build array index
arr_ind = range(len(array1))

# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)

# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])

prob += score

coef = x + y + z

prob += coef == 1

# solve the LP using the default solver
optimization_result = prob.solve()

# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal

# display the results
for var in (x, y,z):
    print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))

输出：

Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1

Process finished with exit code 0

现在，如果您的roc_auc_score函数是非线性的，您将遇到其他麻烦。我鼓励您尝试以线性方式制定分数，可能使用其他变量（例如，如果您希望分数为绝对值）。

如何在最大化函数中优化numpy数组的线性系数？

1 个答案: