基于https://stackoverflow.com/a/44827220/1639834:
我有一个R例程,我需要以动态的方式从我的python代码中调用它。 为此,我打算使用rpy2。
首先我想使用py代码(第一次使用R用户)的R代码:
设置虚拟数据以展示R例程用法
set.seed(101)
data_sample <- c(5+ 3*rt(1000,df=5),
10+1*rt(10000,df=20))
num_components <- 2
例程本身
library(teigen)
tt <- teigen(data_sample,
Gs=num_components,
scale=FALSE,dfupdate="numeric",
models=c("univUU")
)
df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)
参数data_sample
和num_components
由我的python代码动态计算,其中num_components
只是一个整数,data_sample
是一个numpy数组。
作为最终目标,我想让df
,mean
和scale
回到&#34; python world&#34;作为列表或numpy数组进一步处理它们并在我的程序逻辑中使用它们。
到目前为止我用rpy2解决这个问题的第一个实验:
import rpy2
from rpy2.robjects.packages import importr
from rpy2 import robjects as ro
numpy_t_mix_samples = get_student_t_data(n_samples=10000)
r_t_mix_samples = ro.FloatVector(numpy_t_mix_samples)
teigen = importr('teigen')
rres = teigen.teigen(r_t_mix_samples, Gs=2, scale=False, dfupdate="numeric", models=c("univUU"))
这里Gs
的论点仍然是硬编码的,但是后面应该是动态的。
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
iter: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11e3fdd08 / R:0x7ff7cced0a28>
[156.000000]
fuzzy: <class 'rpy2.robjects.vectors.Matrix'>
R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x11e3fd8c8 / R:0x118e78000>
[0.000000, 0.917546, 0.004050, ..., 0.077300, 0.076273, 0.091252]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
...
iter: <class 'rpy2.robjects.vectors.FloatVector'>
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11d632508 / R:0x7ff7cfa81658>
[-25365.912426]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
总而言之,我希望在第一个代码框中获得与原始R示例相同的结果,只是df,mean和scale变量是python lists / numpy数组。 我根本不了解R这一事实使得使用rpy2非常困难,也许有更优雅的方式来动态调用这个例程并在python世界中得到结果。
答案 0 :(得分:0)
考虑使用x.names.index('myname')
引用R对象中的嵌套命名元素。见rpy2 docs。作为提醒并在下面演示,您仍然可以使用数字索引来引用R和Python嵌套对象。
要使用精确的随机数据重现R对象,我们需要在R端运行set.seed
,因为没有简单的方法可以找到跨语言的等效随机数生成器。请参阅相关的post。最后,基础R as.vector()
用于将数组对象转换为向量。 Python中的所有返回都是R FloatVectors:<class 'rpy2.robjects.vectors.FloatVector'>
。
<强>的Python 强>
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')
teigen = importr('teigen')
base.set_seed(101)
data_sample = base.as_numeric([(5+3*i) for i in stats.rt(1000,df=5)] + \
[(10+1*i) for i in stats.rt(10000,df=20)])
num_components = 2
rres = teigen.teigen(data_sample, Gs=num_components, scale=False,
dfupdate="numeric", models="univUU")
# BY NUMBER INDEX
df = rres[2][0]
mean = base.as_vector(rres[2][1])
scale = base.as_vector(rres[2][3])
print(df)
# [1] 3.578491 47.059841
print(mean)
# [1] 4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588
# BY NAME INDEX
# (i.e., find corresponding number to name in R object)
params = rres[rres.names.index('parameters')]
df = params[params.names.index('df')]
mean = base.as_vector(params[params.names.index('mean')])
scale = base.as_vector(params[params.names.index('sigma')])
print(df)
# [1] 3.578491 47.059841
print(mean)
# [1] 4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588
R (等效脚本)
library(teigen)
set.seed(101)
data_sample <- c(5+ 3*rt(1000,df=5),
10+1*rt(10000,df=20))
num_components <- 2
tt <- teigen(data_sample, Gs=num_components, scale=FALSE,
dfupdate="numeric", models="univUU")
# BY NUMBER INDEX
df = tt[[3]][[1]]
mean = as.vector(tt[[3]][[2]])
scale = as.vector(tt[[3]][[4]])
print(df)
# [1] 3.578491 47.059841
print(mean)
# [1] 4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588
# BY NAME INDEX
df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)
print(df)
# [1] 3.578491 47.059841
print(mean)
# [1] 4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588