通过rpy2

时间:2016-04-28 22:59:59

标签: python random-forest rpy2

我不能通过rpy2从python调用R函数cforest(package partykit)。 我怀疑这与问题here有某种关联。 在我看来,问题在于公式参数(例如,公式= y~1 + x1 + x2)。 似乎我正在做的一切正确,因为我可以调用函数lm(库统计数据),它也采用公式参数。

下面的代码显示了我要做的事情(对于不同的调用样式,set method = 0,= 1,= 2,对于测试lm函数,设置= 3)。

method = 1    
import pandas as pd
import numpy as np
import subprocess

import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
import pandas.rpy.common as com
from rpy2.robjects import Formula 


X_train = np.random.rand(500,6)
y_train = np.random.rand(500,1)
ntree = 2
mtry = 5


pk = importr('partykit')
stats = importr('stats')
base = importr('base')

#create dataframes in Python, assign labels consistent with formula below
nx = X_train.shape[1]    
columns = ['y']
for i in range(nx):
    columns.append('x' + str(i))
datatrain = pd.DataFrame(data=np.hstack((y_train, X_train)), columns=columns)

#convert to R dataframe
r_datatrain = com.convert_to_r_dataframe(datatrain)      

#arguments
ctrl = pk.ctree_control(mtry = mtry) 

if method == 0: 
  robjects.r('''
    f <- function(data, ntree, mtry, verbose=FALSE) {
        if (verbose) {
            cat("I am calling f().\n")
        }
    ctrl = ctree_control(mtry = mtry)  
        cforest(formula = y ~ ., data = data, ntree = ntree, control = ctrl)
        }
        ''')
  r_f = robjects.r('f')
  obj = r_f(r_datatrain, ntree, mtry, True)
elif method == 1:
  #arguments  
  obj = pk.cforest('formula = y ~ 1 + x1 + x2', data = r_datatrain, ntree = ntree, control = ctrl)
elif method == 2:
  fmla = Formula('x1 ~ x2')
  env = fmla.environment
  env['ntree'] = ntree
  env['ctrl'] = ctrl
  env['r_datatrain'] = r_datatrain

  obj = robjects.r('cforest(%s, data = r_datatrain, ntree = ntree, control = ctrl)' %fmla.r_repr())
  #obj = pk.cforest("formula = y ~ 1 + x1 + x2", data = r_datatrain, ntree = ntree, control = ctrl)
else:
  obj = stats.lm("formula = y ~ 1 + x1 + x2", data = r_datatrain)

print(obj)

错误消息

method = 0

I am calling f().
/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error in .cnode(1L, data, infl, inputs, weights, ctrl) : 
  R_ExpCovLinstat: y does not have 500 rows

  res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
  File "r2py_issues.py", line 47, in <module>
    obj = r_f(r_datatrain, ntree, mtry, True)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in .cnode(1L, data, infl, inputs, weights, ctrl) : 
  R_ExpCovLinstat: y does not have 500 rows

方法= 1

/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error: inherits(object, "formula") is not TRUE

  res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
  File "r2py_issues.py", line 50, in <module>
    obj = pk.cforest('formula = y ~ 1 + x1 + x2', data = r_datatrain, ntree = ntree, control = ctrl)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error: inherits(object, "formula") is not TRUE

方法= 2

/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error in .cnode(1L, data, infl, inputs, weights, ctrl) : 
  R_ExpCovLinstat: y does not have 500 rows

  res = super(Function, self).__call__(*new_args, **new_kwargs)
Traceback (most recent call last):
  File "r2py_issues.py", line 58, in <module>
    obj = robjects.r('cforest(%s, data = r_datatrain, ntree = ntree, control = ctrl)' %fmla.r_repr())
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/__init__.py", line 321, in __call__
    res = self.eval(p)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in .cnode(1L, data, infl, inputs, weights, ctrl) : 
  R_ExpCovLinstat: y does not have 500 rows

2 个答案:

答案 0 :(得分:0)

对代码进行故障排除可能需要将其简化为足以隔离问题的来源。

例如,可以从问题中取出数据框转换以检查其余部分(此处是对学习者cforest的调用)是否有效。

import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula 

ntree = 2
mtry = 5

pk = importr('partykit')

#arguments
ctrl = pk.ctree_control(mtry = mtry) 

r_datatrain = robjects.r("""
data.frame(y=rnorm(100), 
           x1=rnorm(100), 
           x2=rnorm(100))
""")
obj = pk.cforest(formula = Formula('y ~ 1 + x1 + x2'),
                 data = r_datatrain,
                 ntree = ntree,
                 control = ctrl)

该代码在这里工作。

现在,您可以越来越多地添加预期代码中的元素(例如,从pandas数据框转换,更接近您的示例的数据集),直到它中断为止。这就是您必须排除故障的确切位置。

答案 1 :(得分:0)

我想我发现了一个修复,基本上是使用上面我的答案的代码进行了以下更改: 这条线

#create dataframe directly in R r_datatrain = robjects.r(""" data.frame(y=rY, x=rX ) """)

替换为

#create dataframe directly in R
robjects.r("""
      df <- data.frame(y=rY, 
       x=rX 
      ) 
""")
r_datatrain = robjects.globalenv['df']

你仍然可以用

调用cforest函数
#train
obj = pk.cforest(formula = Formula('y ~ 1 + x.1 + x.2'),
                 data = r_datatrain,
                 ntree = ntree,
                 control = ctrl)