我不能通过rpy2从python调用R函数cforest(package partykit)。 我怀疑这与问题here有某种关联。 在我看来,问题在于公式参数(例如,公式= y~1 + x1 + x2)。 似乎我正在做的一切正确,因为我可以调用函数lm(库统计数据),它也采用公式参数。
下面的代码显示了我要做的事情(对于不同的调用样式,set method = 0,= 1,= 2,对于测试lm函数,设置= 3)。
method = 1
import pandas as pd
import numpy as np
import subprocess
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
import pandas.rpy.common as com
from rpy2.robjects import Formula
X_train = np.random.rand(500,6)
y_train = np.random.rand(500,1)
ntree = 2
mtry = 5
pk = importr('partykit')
stats = importr('stats')
base = importr('base')
#create dataframes in Python, assign labels consistent with formula below
nx = X_train.shape[1]
columns = ['y']
for i in range(nx):
columns.append('x' + str(i))
datatrain = pd.DataFrame(data=np.hstack((y_train, X_train)), columns=columns)
#convert to R dataframe
r_datatrain = com.convert_to_r_dataframe(datatrain)
#arguments
ctrl = pk.ctree_control(mtry = mtry)
if method == 0:
robjects.r('''
f <- function(data, ntree, mtry, verbose=FALSE) {
if (verbose) {
cat("I am calling f().\n")
}
ctrl = ctree_control(mtry = mtry)
cforest(formula = y ~ ., data = data, ntree = ntree, control = ctrl)
}
''')
r_f = robjects.r('f')
obj = r_f(r_datatrain, ntree, mtry, True)
elif method == 1:
#arguments
obj = pk.cforest('formula = y ~ 1 + x1 + x2', data = r_datatrain, ntree = ntree, control = ctrl)
elif method == 2:
fmla = Formula('x1 ~ x2')
env = fmla.environment
env['ntree'] = ntree
env['ctrl'] = ctrl
env['r_datatrain'] = r_datatrain
obj = robjects.r('cforest(%s, data = r_datatrain, ntree = ntree, control = ctrl)' %fmla.r_repr())
#obj = pk.cforest("formula = y ~ 1 + x1 + x2", data = r_datatrain, ntree = ntree, control = ctrl)
else:
obj = stats.lm("formula = y ~ 1 + x1 + x2", data = r_datatrain)
print(obj)
错误消息
method = 0
I am calling f().
/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error in .cnode(1L, data, infl, inputs, weights, ctrl) :
R_ExpCovLinstat: y does not have 500 rows
res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
File "r2py_issues.py", line 47, in <module>
obj = r_f(r_datatrain, ntree, mtry, True)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in .cnode(1L, data, infl, inputs, weights, ctrl) :
R_ExpCovLinstat: y does not have 500 rows
方法= 1
/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error: inherits(object, "formula") is not TRUE
res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
File "r2py_issues.py", line 50, in <module>
obj = pk.cforest('formula = y ~ 1 + x1 + x2', data = r_datatrain, ntree = ntree, control = ctrl)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error: inherits(object, "formula") is not TRUE
方法= 2
/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error in .cnode(1L, data, infl, inputs, weights, ctrl) :
R_ExpCovLinstat: y does not have 500 rows
res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
File "r2py_issues.py", line 58, in <module>
obj = robjects.r('cforest(%s, data = r_datatrain, ntree = ntree, control = ctrl)' %fmla.r_repr())
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/__init__.py", line 321, in __call__
res = self.eval(p)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in .cnode(1L, data, infl, inputs, weights, ctrl) :
R_ExpCovLinstat: y does not have 500 rows
答案 0 :(得分:0)
对代码进行故障排除可能需要将其简化为足以隔离问题的来源。
例如,可以从问题中取出数据框转换以检查其余部分(此处是对学习者cforest
的调用)是否有效。
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
ntree = 2
mtry = 5
pk = importr('partykit')
#arguments
ctrl = pk.ctree_control(mtry = mtry)
r_datatrain = robjects.r("""
data.frame(y=rnorm(100),
x1=rnorm(100),
x2=rnorm(100))
""")
obj = pk.cforest(formula = Formula('y ~ 1 + x1 + x2'),
data = r_datatrain,
ntree = ntree,
control = ctrl)
该代码在这里工作。
现在,您可以越来越多地添加预期代码中的元素(例如,从pandas数据框转换,更接近您的示例的数据集),直到它中断为止。这就是您必须排除故障的确切位置。
答案 1 :(得分:0)
我想我发现了一个修复,基本上是使用上面我的答案的代码进行了以下更改: 这条线
#create dataframe directly in R
r_datatrain = robjects.r(""" data.frame(y=rY, x=rX ) """)
替换为
行#create dataframe directly in R
robjects.r("""
df <- data.frame(y=rY,
x=rX
)
""")
r_datatrain = robjects.globalenv['df']
你仍然可以用
调用cforest函数#train
obj = pk.cforest(formula = Formula('y ~ 1 + x.1 + x.2'),
data = r_datatrain,
ntree = ntree,
control = ctrl)