通过rpy2使用model.matrix?

时间:2015-02-03 02:13:01

标签: python r rpy2 model.matrix

我更喜欢使用python而不是R来完成我的工作。我不时需要使用R. 函数,我开始为此目的尝试Rpy2。

我尝试但未能找到如何使用Rpy2

复制以下内容
design <- model.matrix(~Subject+Treat)

我已经走了这么远:

import rpy2.robjects as robjects
fmla = robjects.Formula('~subject+treatment')
env = fmla.environment
env['subject'] = sbj_group
env['treatment'] = trt_group

从我看到here。 但我找不到如何执行model.matrix。我尝试了几种不同的方式:

robjects.r.model_matrix(fmla)
robjects.r('model.matrix(%s)' %fmla.r_repr())

正如你所看到的,没有一个是正确的。

我是Rpy2的新手,在R中相当缺乏经验。任何帮助都将不胜感激!

1 个答案:

答案 0 :(得分:3)

你可以evaluate strings as R code

import numpy as np
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
ro.numpy2ri.activate() 
R = ro.r

subject = np.repeat([1,2,3], 4)
treatment = np.tile([1,2,3,4], 3)
R.assign('subject', subject)
R.assign('treatment', treatment)
R('subject <- as.factor(subject)')
R('treatment <- as.factor(treatment)')
R('design <- model.matrix(~subject+treatment)')
R('print(design)')

产量

   (Intercept) subject2 subject3 treatment2 treatment3 treatment4
1            1        0        0          0          0          0
2            1        0        0          1          0          0
3            1        0        0          0          1          0
4            1        0        0          0          0          1
5            1        1        0          0          0          0
6            1        1        0          1          0          0
7            1        1        0          0          1          0
8            1        1        0          0          0          1
9            1        0        1          0          0          0
10           1        0        1          1          0          0
11           1        0        1          0          1          0
12           1        0        1          0          0          1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$subject
[1] "contr.treatment"

attr(,"contrasts")$treatment
[1] "contr.treatment"

R(...)返回可以在Python端操作的对象。 例如,

design = R('model.matrix(~subject+treatment)')

rpy2.robjects.vectors.Matrix分配给design

arr = np.array(design)

使arr NumPy数组

[[ 1.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  1.  0.  0.]
 [ 1.  0.  0.  0.  1.  0.]
 [ 1.  0.  0.  0.  0.  1.]
 [ 1.  1.  0.  0.  0.  0.]
 [ 1.  1.  0.  1.  0.  0.]
 [ 1.  1.  0.  0.  1.  0.]
 [ 1.  1.  0.  0.  0.  1.]
 [ 1.  0.  1.  0.  0.  0.]
 [ 1.  0.  1.  1.  0.  0.]
 [ 1.  0.  1.  0.  1.  0.]
 [ 1.  0.  1.  0.  0.  1.]]

可以使用

访问列名称
np.array(design.colnames)
# array(['(Intercept)', 'subject2', 'subject3', 'treatment2', 'treatment3',
#        'treatment4'], 
#       dtype='|S11')