使用rpy2运行R函数时出错

时间:2016-04-11 15:12:37

标签: python r rpy2

我尝试使用rpy2questionr包运行multi.split函数。

这是我的代码

from rpy2 import robjects
from rpy2.robjects.packages import importr

questionr = importr(str('questionr'))

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')
在最后一行之后

我收到以下错误:

RRuntimeError: Error in `colnames<-`(`*tmp*`, value = c("c(\"red/blue\",_\"green\",_\"red/green\",_\"blue/red\",_\"red/blue\",_\"green\",_.blue",  : 
 'names' attribute [4] must be the same length as the vector [3]

我认为它与我发送的矢量大小有关,因为如果删除最后一项

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue"]

然后运行

data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

我没有收到任何错误消息。如果我改变&#34; split_char&#39; var,例如:

data_table = multi_split(data_vector, split_char='.')

我没有收到任何错误消息,无论我发送数组的大小。

我试图直接在R(使用R-Studio)中运行匹配代码,它运行时没有问题。 关于如何解决这个问题的任何想法?

1 个答案:

答案 0 :(得分:1)

这似乎是因为函数multi_split(R包中的multi.split)试图使用与第一个参数(此处为"data_vector")关联的表达式的字符串表示。

R功能的签名是:

multi.split(var, split.char = "/", mnames = NULL)

mnames的文档是:

  

给出生成的变量标题的名称。如果为NULL,则名称为   根据原始变量名称和答案计算。

在调用multi_split(data_vector, split_char='/')中,嵌入式R无法看到变量名称,因为这是一个Python调用,而data_vector是一个匿名变量(只有内容,没有变量名称)。

我虽然你可以指定mnames但是你检查过这个不起作用(见下面的评论)。这就是代码似乎所说的:无论是否指定了mnames,都会评估行vname <- deparse(substitute(var))https://github.com/juba/questionr/blob/9cf09f3ffcd6c8df24182380f12d52b061c221ef/R/table.multi.R#L161

另一种方法是计算R表达式的使用。较旧的帖子应该为此提供必要的位:What object to pass to R from rpy2?

第三种可能性是创造性地混合Python-strings-as-R-code:

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
# binding the R vector to a symbol in R's "GlobalEnv"
robjects.globalenv['mydata'] = data_vector
# the call is now in a Python string that is evaluated as R code
data_table = robjects.r("multi.split(data_vector, split_char='/')")