Question

我正在实现一个稀疏矩阵分解例程，以在python中生成一个稀疏分数矩阵和一个密集特征矩阵。 scikit-learn scikitlearn.decomposition.DictionaryLearning中有一个直接实现，对于尺寸为1000x947的矩阵，在具有16GB内存的16核AMD系统上，在约20秒内返回结果，效果很好。尽管使用起来既快捷又容易，但是我必须为问题添加更多的约束，这需要使用CVXPY工具箱之类的东西。我在CVXPY工具箱上使用“ SCS”求解器，但是用于最小化稀疏分数矩阵和密集特征矩阵的替代最小化公式甚至无法在我的计算机上运行，并且在HPC群集上需要花费几天的时间。我想念什么？


#Scikitlearn implementation
       estimator_RA = skd.dict_learning(data_RA, n_components=number_of_components_RA, alpha=alpha_value_RA, n_jobs=-1,
                                     verbose=False, positive_dict=True, positive_code=True)


# CVPY implementation
    MAX_ITERS = 10
    residual = np.zeros(MAX_ITERS)
    for iter_num in range(1, 1 + MAX_ITERS):
        # At the beginning of an iteration, U and V are NumPy
        # array types, NOT CVXPY variables.

        # For odd iterations, treat U constant, optimize over V.
        if iter_num % 2 == 1:
            V_RA = cp.Variable((k_RA, n_RA))
            constraint = [V_RA >= 0]
            # constraint += [cp.norm2(V_RA, 1) <= np.ones((V_RA.shape[0],))]

            print('Estimating V')
        # For even iterations, treat V constant, optimize over U.
        else:
            U_RA = cp.Variable((m_RA, k_RA))
            constraint = [U_RA >= 0]

            print('Estimating U')

        # Solve the problem.
        # increase max iters otherwise, a few iterations are "OPTIMAL_INACCURATE"
        # (eg a few of the entries in U or V are negative beyond standard tolerances)
        obj = cp.Minimize(cp.norm(data_RA - U_RA * V_RA, 'fro') + alpha_value_RA * cp.pnorm(U_RA, 1))


        prob = cp.Problem(obj, constraint)
        prob.solve(solver='SCS', max_iters=1000, verbose=True, use_indirect=True)

        if prob.status != cp.OPTIMAL:
            raise Exception("Solver did not converge!")

        print('Iteration {}, residual norm {}'.format(iter_num, prob.value))
        residual[iter_num - 1] = prob.value

        # Convert variable to NumPy array constant for next iteration.
        if iter_num % 2 == 1:
            V_RA = V_RA.value
        else:
            U_RA = U_RA.value

为什么字典学习的scikitlearn实现比cvxpy更快？

0 个答案: