我在使用rpy2和R库" e1071"时遇到了一些麻烦。我试图从SVM预测中检索概率数据,但它永远不会包含在返回的对象中。
建立模型呼叫" svm"用"概率= TRUE"会告诉模型在请求预测时包含额外的数据。预测数据通过"预测"命令用"概率= TRUE"参数和应返回带有标签和"概率的复杂数据结构"属性。我的问题是概率属性未包含在结果中。就像概率参数永远不会包含在预测调用中一样。
以下是一些示例代码(必须安装e1071 R库):
import numpy
import rpy2
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
importr('e1071')
# configure the data set
SAMPLES = 50
trainingDataClassless = numpy.random.random((SAMPLES, 7))
trainingDataClasses = numpy.where(numpy.random.random((SAMPLES, 1)) > 0.5, 0.0, 1.0)
trainingDataFactorClasses = rpy2.robjects.FactorVector(trainingDataClasses)
# create the args for the svm
svmargs = {"x": trainingDataClassless, "y": trainingDataFactorClasses, "probability": True,
"kernel": "linear", "type": "C-classification"}
print("Starting SVM with parameters: %s" % (svmargs,))
svmObj = rpy2.robjects.r['svm'](**svmargs)
print("SVM Analysis")
predictOutcomes = rpy2.robjects.r['predict'](svmObj, trainingDataClassless, probability=True)
print("outcomes: %s" % (predictOutcomes,))
probs = rpy2.robjects.r['attr'](predictOutcomes, "probabilities")
print("probs: %s" % (probs,)) # should NOT be NULL!
有关R中预测函数的更多信息(带有工作概率示例),请参见第39页的e1071 documentation。
答案 0 :(得分:2)
该属性在某处出现丢失,可能是在生成的R对象(一个因子)的低级和高级表示之间进行转换时。
使用低级别接口调用是一种解决方法(见下文),但如果您可以在bitbucket上的rpy2问题跟踪器上报告问题,那将非常好。
r_predict = rpy2.robjects.rinterface.globalenv.get('predict')
r_traindata = rpy2.robjects.Matrix(trainingDataClassless)
r_true = rpy2.robjects.BoolVector([True])
predictOutcomes = r_predict(svmObj,
r_traindata,
probability=r_true)
修改:问题已经打开...已关闭(错误已修复 - https://bitbucket.org/rpy2/rpy2/issues/299)
答案 1 :(得分:0)
你的R函数(svm
和predict
)需要在R方面运行,而不是在Python上运行,因为Python没有看到或知道那些专门的函数。 Python可以用于numpy样本计算,调用函数的管道以及打印结果:
# PASS PYTHON DATASET OBJECTS INTO R
# numpy objects => R matrices
tdClassless_row,tdClassess_col = trainingDataClassless.shape
rmatrix_tdClassless = rpy2.robjects.r.matrix(tdClassless,
nrow=tdClassless_row, ncol=tdClassless_col)
rpy2.robjects.r.assign("tdClassless", rmatrix_tdClassless)
tdFactorClasses_row,tdFactorClasses_col = trainingDataFactorClasses.shape
rmatrx_tdFactorClasses = rpy2.robjects.r.matrix(tdFactorClasses,
nrow=tdFactorClasses_row, ncol=tdFactorClasses_col)
rpy2.robjects.r.assign("tdFactorClasses", rmatrix_tdFactorClassless)
# OBTAIN THE SVM FUNCTION
rsvm_funct = rpy2.robjects.globalenv['svm']
# PASS SVM PARAMETERS
svmObj_py = rsvm_funct (
rpy2.robjects('x = tdClassless'),
rpy2.robjects('y = tdFactorClasses'),
rpy2.robjects('probability = TRUE'),
rpy2.robjects('kernel = "linear"'),
rpy2.robjects('type = "C-classification"')
)
# ASSIGN svmObj in R
rpy2.robjects.r.assign("svmObj", svmObj_py)
# OBTAIN THE PREDICT FUNCTION
rpredict_funct = rpy2.robjects.globalenv['predict']
// PASS PREDICT PARAMETERS
predictOutcomes = rpredict_funct(
rpy2.robjects('svmObj'),
rpy2.robjects('tdClassless'),
rpy2.robjects('probability = TRUE')
)