Question

＆＃39;＆＃39;一般情况下，创建批量线性约束会获得更好的性能，而不是一次创建一个。我只是想知道它是否存在一个巨大的问题。＆＃39;＆＃39; - 聪明的程序员。

要清楚，我有一个（35k x 40）数据集，我想在它上面做SVM。我需要生成这个数据集的Gramm矩阵，它很好，但要将系数传递给CPLEX是一团糟，需要几个小时，这里是我的代码：

    nn = 35000
    XXt = np.random.rand(nn,nn) # the gramm matrix of the dataset
    yy = np.random.rand(nn)     # the label vector of the dataset

    temp = ((yy*XXt).T)*yy
    xg, yg = np.meshgrid(range(nn), range(nn))
    indici = np.dstack([yg,xg])

    quadraric_part = []
    for ii in xrange(nn):
        for indd in indici[ii][ii:]:
            quadraric_part.append([indd[0],indd[1],temp[indd[0],indd[1]]])

＆＃39; quadratic_part＆＃39;是[i，j，c_ij]形式的列表，其中c_ij是存储在temp中的系数。它将被传递给函数＆quot; objective.set_quadratic_coefficients（）＆＃39; CPLEX Python API。

有一种更明智的方法吗？

P.S。我可能有一个记忆问题，所以它会更好，而是存储整个列表＆＃39; quadratic_part＆＃39;，多次调用函数＆＃39; objective.set_quadratic_coefficients（）＆＃39; ....你知道我的意思吗？！

Answer 1

在幕后，objective.set_quadratic使用了C Callable Library中的CPXXcopyquad函数。鉴于，objective.set_quadratic_coefficients使用CPXXcopyqpsep。

这是一个例子（请记住，我不是一个笨拙的专家;很可能有更好的方法来做这一部分）：

import numpy as np
import cplex

nn = 5  # a small example size here

XXt = np.random.rand(nn,nn) # the gramm matrix of the dataset
yy = np.random.rand(nn)     # the label vector of the dataset
temp = ((yy*XXt).T)*yy

# create symetric matrix
tempu = np.triu(temp)     # upper triangle
iu1 = np.triu_indices(nn, 1)
tempu.T[iu1] = tempu[iu1] # copy upper into lower

ind = np.array([[x for x in range(nn)] for x in range(nn)])

qmat = []
for i in range(nn):
    qmat.append([np.arange(nn), tempu[i]])

c = cplex.Cplex()
c.variables.add(lb=[0]*nn)
c.objective.set_quadratic(qmat)
c.write("test2.lp")

你的Q矩阵是完全密集的，所以根据你的内存量，这种技术可能无法扩展。但是，如果可能，您应该使用objective.set_quadratic更好地初始化Q矩阵。也许您需要使用混合技术，同时使用set_quadratic和set_quadratic_coefficients。

使用python和CPLEX的SVM，加载目标函数的二次部分

1 个答案: