参数未通过rpy2传递给R

时间:2015-08-26 22:56:49

标签: python r rpy2

我在使用rpy2和R库" e1071"时遇到了一些麻烦。我试图从SVM预测中检索概率数据,但它永远不会包含在返回的对象中。

建立模型呼叫" svm"用"概率= TRUE"会告诉模型在请求预测时包含额外的数据。预测数据通过"预测"命令用"概率= TRUE"参数和应返回带有标签和"概率的复杂数据结构"属性。我的问题是概率属性未包含在结果中。就像概率参数永远不会包含在预测调用中一样。

以下是一些示例代码(必须安装e1071 R库):

import numpy
import rpy2
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
importr('e1071')


# configure the data set
SAMPLES = 50
trainingDataClassless = numpy.random.random((SAMPLES, 7))
trainingDataClasses = numpy.where(numpy.random.random((SAMPLES, 1)) > 0.5, 0.0, 1.0)
trainingDataFactorClasses = rpy2.robjects.FactorVector(trainingDataClasses)

# create the args for the svm
svmargs = {"x": trainingDataClassless, "y": trainingDataFactorClasses, "probability": True,
           "kernel": "linear", "type": "C-classification"}

print("Starting SVM with parameters: %s" % (svmargs,))
svmObj = rpy2.robjects.r['svm'](**svmargs)

print("SVM Analysis")
predictOutcomes = rpy2.robjects.r['predict'](svmObj, trainingDataClassless, probability=True)
print("outcomes: %s" % (predictOutcomes,))
probs = rpy2.robjects.r['attr'](predictOutcomes, "probabilities")
print("probs: %s" % (probs,)) # should NOT be NULL!

有关R中预测函数的更多信息(带有工作概率示例),请参见第39页的e1071 documentation

2 个答案:

答案 0 :(得分:2)

该属性在某处出现丢失,可能是在生成的R对象(一个因子)的低级和高级表示之间进行转换时。

使用低级别接口调用是一种解决方法(见下文),但如果您可以在bitbucket上的rpy2问题跟踪器上报告问题,那将非常好。

r_predict = rpy2.robjects.rinterface.globalenv.get('predict')
r_traindata = rpy2.robjects.Matrix(trainingDataClassless)
r_true = rpy2.robjects.BoolVector([True])
predictOutcomes = r_predict(svmObj,
                            r_traindata,
                            probability=r_true)

修改:问题已经打开...已关闭(错误已修复 - https://bitbucket.org/rpy2/rpy2/issues/299

答案 1 :(得分:0)

你的R函数(svmpredict)需要在R方面运行,而不是在Python上运行,因为Python没有看到或知道那些专门的函数。 Python可以用于numpy样本计算,调用函数的管道以及打印结果:

# PASS PYTHON DATASET OBJECTS INTO R  
# numpy objects => R matrices 
tdClassless_row,tdClassess_col = trainingDataClassless.shape
rmatrix_tdClassless = rpy2.robjects.r.matrix(tdClassless, 
                            nrow=tdClassless_row, ncol=tdClassless_col)
rpy2.robjects.r.assign("tdClassless", rmatrix_tdClassless)

tdFactorClasses_row,tdFactorClasses_col = trainingDataFactorClasses.shape
rmatrx_tdFactorClasses = rpy2.robjects.r.matrix(tdFactorClasses, 
                            nrow=tdFactorClasses_row, ncol=tdFactorClasses_col)
rpy2.robjects.r.assign("tdFactorClasses", rmatrix_tdFactorClassless)

# OBTAIN THE SVM FUNCTION
rsvm_funct = rpy2.robjects.globalenv['svm']

# PASS SVM PARAMETERS
svmObj_py = rsvm_funct (
     rpy2.robjects('x = tdClassless'), 
     rpy2.robjects('y = tdFactorClasses'),
     rpy2.robjects('probability = TRUE'),
     rpy2.robjects('kernel = "linear"'), 
     rpy2.robjects('type = "C-classification"')
)
# ASSIGN svmObj in R 
rpy2.robjects.r.assign("svmObj", svmObj_py)

# OBTAIN THE PREDICT FUNCTION
rpredict_funct = rpy2.robjects.globalenv['predict']

// PASS PREDICT PARAMETERS
predictOutcomes = rpredict_funct(
     rpy2.robjects('svmObj'), 
     rpy2.robjects('tdClassless'), 
     rpy2.robjects('probability = TRUE')
)