R中的数据整形和逻辑索引

时间:2012-02-16 03:34:16

标签: r

我有以下(虚拟)数据:

d <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 5L, 
5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("apple", "grapefruit", 
"orange", "peach", "pear"), class = "factor"), type = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("large", 
"small"), class = "factor"), location = structure(c(1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("P1", 
"P2", "P3"), class = "factor"), diameter = c(17.2, 19.1, 18.5, 
23.3, 22.9, 19.4, 11.1, 11.8, 6.8, 3.2, 7.9, 5.6, 8.4, 9.2, 9.7, 
17.1, 19.4, 18.9, 11.8, 10.6, 10.1, 18.8, 17.9, 13.2, 8.5, 8.9, 
7.2, 10.1, 8.7, 6.6)), .Names = c("group", "type", "location", 
"diameter"), class = "data.frame", row.names = c(NA, -30L))

我想从中创建一个新的数据框,从&#34;直径&#34;中获得比率。 3个因素的每个级别的变量:&#34; location&#34;,&#34; type&#34;,&#34; group&#34;。

P3.P1.L <- with(d, diameter[group=="pear" & type=="large" & location=="P3"] / diameter[group=="pear" & type=="large" & location=="P1"] )
P2.P1.L <- with(d, diameter[group=="pear" & type=="large" & location=="P2"] / diameter[group=="pear" & type=="large" & location=="P1"] )
P3.P1.S <- with(d, diameter[group=="pear" & type=="small" & location=="P3"] / diameter[group=="pear" & type=="small" & location=="P1"] )
P2.P1.S <- with(d, diameter[group=="pear" & type=="small" & location=="P2"] / diameter[group=="pear" & type=="small" & location=="P1"] )

最终的data.frame看起来像这样:

group, type, P2.P1, P3.P1
pear, large, 1.75, 2.469
pear, small, 0.613, 1.063
apple, large, ..., ...
apple, small, ..., ...

显然,我可以像上面所说明的那样做到这一点 - 逻辑索引每个实例中3个因素的正确级别。问题是,在我的真实数据中,我在&#34;组&#34;中有大约40个级别。因素(尽管仍然只有2个&#34;类型&#34;)。我想要一个允许我使用&#34; location&#34;进行逻辑索引的解决方案。也许&#34;键入&#34;,然后遍历&#34; group&#34;的所有级别。例如,像:

with(d, by(d, group, function(x) diameter[type=="large" & location=="P3"] / diameter[type=="large" & location=="P1"]) )

但是这并不是我喜欢做的事情(并使用&#34; group == x&#34;无法工作)。

一种解决方案,跟踪每个比率与其&#34;组&#34;和&#34;键入&#34;因子水平,然后将这些放入新的数据框,如上面所需的输出所示,将是壮观的。任何有关如何处理此问题的建议都将非常感激。

1 个答案:

答案 0 :(得分:2)

您可以使用dcast将数据转换为更宽的格式。

library(reshape2)
d <- dcast( d, group + type ~ location )

然后直接计算您想要的比率,例如:

transform( d, P2.P1=P2/P1, P3.P1=P3/P1 )