我正在尝试从glm.nb
致电R rpy2
:
from rpy2 import robjects
from rpy2.robjects.packages import importr
MASS = importr('MASS')
stats = importr('stats')
def glm_nb(x,y):
formula = robjects.Formula('y~x')
env = formula.environment
env["x"] = x
env["y"] = y
fitted = MASS.glm_nb(formula)
# fitted = stats.glm(formula)
return fitted
测试:
N = 100
x = np.random.rand(N)
y = x + np.random.poisson( 10, N)
fitted = glm_nb(x, np.round(y))
返回错误:
104 for k, v in kwargs.items():
105 new_kwargs[k] = conversion.py2ri(v)
--> 106 res = super(Function, self).__call__(*new_args, **new_kwargs)
107 res = conversion.ri2ro(res)
108 return res
RRuntimeError: Error in x[good, , drop = FALSE] * w : non-conformable arrays
然而,当我运行简单的glm
时,它运行正常。可能是什么问题以及如何调试它?
答案 0 :(得分:2)
基本问题涉及R中矩阵和数组的数据结构。下面用修复复制R中的错误,复制rpy2
中的修复的挑战,以及一个有效的解决方案:
R错误和修复
library(MASS)
# ARRAY
x <- array(rnorm(100))
y <- as.integer(x) + array(rpois(100, 10))
model2 <- glm.nb(y~x)
x中的错误[good,,drop = FALSE] * w:不一致的数组
然而,有三个修复可用:1)使用矩阵(二维特殊类型的数组); 2)等价定义的数组(指定dim
参数); 3)矩阵转换。请注意:根据随机值显示迭代限制可能的警告,但仍会运行。
# MATRIX
x <- matrix(rnorm(100))
y <- as.integer(x) + matrix(rpois(100, 10))
model1 <- glm.nb(y~x)
# EQUIVALENT ARRAY
x <- array(rnorm(100),c(100,1))
y <- as.integer(x) + matrix(rpois(100, 10),c(100,1))
model2 <- glm.nb(y~x)
# EXPLICIT MATRIX CONVERSION (USED IN WORKING SOLUTION)
x <- as.matrix(array(rnorm(100)))
y <- as.integer(x) + as.matrix(array(rpois(100, 10)))
model3 <- glm.nb(y~x)
<强>挑战强>
Python的rpy2
没有有效地从我的脚本工作中将numpy矩阵传递到R矩阵中,因为两者 stat的简单glm()
和MASS'出现了不同的错误glm.nb()
:
import numpy as np
from rpy2 import robjects
from rpy2.robjects.packages import importr
from rpy2.robjects.numpy2ri import numpy2ri
MASS = importr('MASS')
#rpy2 + negative binomial glm
stats = importr('stats')
def glm_nb(x,y):
formula = robjects.Formula('y~x')
env = formula.environment
env["x"] = x
env["y"] = y
fitted = MASS.glm_nb(formula)
# fitted = stats.glm(formula)
return fitted
N = 100
x = np.random.rand(N)
x = np.asmatrix(x) # PYTHON CONVERSION TO MATRIX
r_x = numpy2ri(x)
# REPLACED NP.ROUND FOR AS.TYPE() TO COMPARE WITH R
y = x.astype(int) + np.random.poisson(10, N)
y = np.asmatrix(y) # PYTHON CONVERSION TO MATRIX
r_y = numpy2ri(y)
fitted = glm_nb(r_x, r_y)
rpy2.rinterface.RRuntimeError:glm.fitter中的错误(x = X,y = Y,w = w,start = start,etastart = etastart,:找不到对象'fit'
即使numpy2ri.activate()
无法转换numpy矩阵:
from rpy2.robjects import numpy2ri
robjects.numpy2ri.activate()
r_x = numpy2ri.ri2py(x)
r_y = numpy2ri.ri2py(y)
NotImplementedError:未为对象定义转换'ri2py' 输入
'<class 'numpy.matrixlib.defmatrix.matrix'>'
工作解决方案
简单地与robjects.r()
接口并让R将数组对象转换为矩阵工作。回想一下上面的第三个修复:
N = 100
x = np.random.rand(N)
r_x = numpy2ri(x)
y = x.astype(int) + np.random.poisson(10, N)
r_y = numpy2ri(y)
from rpy2.robjects import r
r.assign("y", r_y)
r.assign("x", r_x)
r("x <- as.matrix(x)")
r("y <- as.matrix(y)")
r("res <- glm.nb(y~x)")
r_result = r("res[1:5]")
# CONVERSION INTO PY DICTIONARY
from rpy2.robjects import pandas2ri
pandas2ri.activate()
pyresult = pandas2ri.ri2py(r_result)
print(pyresult) # OUTPUTS COEFF, RESID, FITTED VALS, EFFECTS, R
# OR OLDER DEPRECATED CONVERSION
import pandas.rpy.common as com
pyresult = com.convert_robj(r_result)
print(pyresult) # OUTPUTS COEFF, RESID, FITTED VALS, EFFECTS, R
命令行解决方案
如果你的应用程序允许,只需从Python调用R建模脚本作为命令行子进程,绕过任何rpy2
的需要,甚至根据需要传递参数:
from subprocess import Popen, PIPE
command = 'Rscript.exe'
path2Script = 'path/to/Script.R'
args = ['arg1', 'arg2', 'arg3']
cmd = [command, path2Script] + args
p = Popen(cmd,stdin= PIPE, stdout= PIPE, stderr= PIPE)
output,error = p.communicate()
if p.returncode == 0:
print('R OUTPUT:\n {0}'.format(output))
else:
print('R ERROR:\n {0}'.format(error))
答案 1 :(得分:1)
正在发生的事情是潜在的R代码期待&#34;向量&#34;而是数组,但Python对象是数组。
一个简单的解决方法是在你正在调用它想要/期望的包MASS中给出R函数。您的测试中的以下行可以更改:
fitted = glm_nb(x, np.round(y))
......对此:
import array
fitted = glm_nb(array.array('f', x), array.array('f', np.round(y)))
...或者这个:
from rpy2.robjects.vectors import FloatVector
fitted = glm_nb(FloatVector(x), FloatVector(np.round(y)))