使用rpy2从python动态调用R库

时间:2017-07-02 17:49:35

标签: python r numpy rpy2

基于https://stackoverflow.com/a/44827220/1639834

我有一个R例程,我需要以动态的方式从我的python代码中调用它。 为此,我打算使用rpy2。

首先我想使用py代码(第一次使用R用户)的R代码:

设置虚拟数据以展示R例程用法

 set.seed(101)
 data_sample <- c(5+ 3*rt(1000,df=5),
        10+1*rt(10000,df=20))

 num_components <- 2

例程本身

library(teigen)
 tt <- teigen(data_sample,
        Gs=num_components,  
        scale=FALSE,dfupdate="numeric",
        models=c("univUU") 
 )

df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)

参数data_samplenum_components由我的python代码动态计算,其中num_components只是一个整数,data_sample是一个numpy数组。

作为最终目标,我想让dfmeanscale回到&#34; python world&#34;作为列表或numpy数组进一步处理它们并在我的程序逻辑中使用它们。

到目前为止我用rpy2解决这个问题的第一个实验:

import rpy2
from rpy2.robjects.packages import importr
from rpy2 import robjects as ro

numpy_t_mix_samples = get_student_t_data(n_samples=10000)

r_t_mix_samples = ro.FloatVector(numpy_t_mix_samples)

teigen = importr('teigen')
rres = teigen.teigen(r_t_mix_samples, Gs=2, scale=False, dfupdate="numeric", models=c("univUU"))

这里Gs的论点仍然是硬编码的,但是后面应该是动态的。

然后,rres打印出几乎难以理解的输出(我猜是因为它没有正确地使用rpy2进行渲染):

R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
  iter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11e3fdd08 / R:0x7ff7cced0a28>
[156.000000]
  fuzzy: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x11e3fd8c8 / R:0x118e78000>
[0.000000, 0.917546, 0.004050, ..., 0.077300, 0.076273, 0.091252]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
  ...
  iter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11d632508 / R:0x7ff7cfa81658>
[-25365.912426]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]

总而言之,我希望在第一个代码框中获得与原始R示例相同的结果,只是df,mean和scale变量是python lists / numpy数组。 我根本不了解R这一事实使得使用rpy2非常困难,也许有更优雅的方式来动态调用这个例程并在python世界中得到结果。

1 个答案:

答案 0 :(得分:0)

考虑使用x.names.index('myname')引用R对象中的嵌套命名元素。见rpy2 docs。作为提醒并在下面演示,您仍然可以使用数字索引来引用R和Python嵌套对象。

要使用精确的随机数据重现R对象,我们需要在R端运行set.seed,因为没有简单的方法可以找到跨语言的等效随机数生成器。请参阅相关的post。最后,基础R as.vector()用于将数组对象转换为向量。 Python中的所有返回都是R FloatVectors:<class 'rpy2.robjects.vectors.FloatVector'>

<强>的Python

from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')
teigen = importr('teigen')

base.set_seed(101)
data_sample = base.as_numeric([(5+3*i) for i in stats.rt(1000,df=5)] + \
                              [(10+1*i) for i in stats.rt(10000,df=20)])

num_components = 2

rres = teigen.teigen(data_sample, Gs=num_components, scale=False, 
                     dfupdate="numeric", models="univUU")

# BY NUMBER INDEX
df = rres[2][0]
mean = base.as_vector(rres[2][1])
scale = base.as_vector(rres[2][3])

print(df)
# [1]  3.578491 47.059841
print(mean)
# [1]  4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588


# BY NAME INDEX 
# (i.e., find corresponding number to name in R object)
params = rres[rres.names.index('parameters')]

df = params[params.names.index('df')]
mean = base.as_vector(params[params.names.index('mean')])
scale = base.as_vector(params[params.names.index('sigma')])

print(df)
# [1]  3.578491 47.059841
print(mean)
# [1]  4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588

R (等效脚本)

library(teigen)

set.seed(101)
data_sample <- c(5+ 3*rt(1000,df=5),
                 10+1*rt(10000,df=20))
num_components <- 2

tt <- teigen(data_sample, Gs=num_components, scale=FALSE, 
             dfupdate="numeric", models="univUU")    

# BY NUMBER INDEX
df = tt[[3]][[1]]
mean = as.vector(tt[[3]][[2]])
scale = as.vector(tt[[3]][[4]])

print(df)
# [1]  3.578491 47.059841     
print(mean)
# [1]  4.939179 10.002038     
print(scale)
# [1] 8.763076 1.041588

# BY NAME INDEX
df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)

print(df)
# [1]  3.578491 47.059841    
print(mean)
# [1]  4.939179 10.002038    
print(scale)
# [1] 8.763076 1.041588