Question

尝试使用tapply重塑某些数据表。如果你有一个因子，一个变量和你想要的数学函数，那就直接前进了。但是我有一些数据集，我想用两个（或者更多）分组级别重新格式化。

考虑

x<-1:20 # variable
y<-factor(rep(letters[1:5], each=4)) # first grouping variable
z<-factor(rep(letters[6:7], each=10)) # second grouping variable
tapply(x,z,sum) # summarized table for factor z

  f   g 
 55 155

tapply(x,y,sum) # summarized table for factor y

 a  b  c  d  e
10 26 42 58 74

但是，我想要的输出是一个类似于：

的表

f  f  f  f  f g  g  g  g  g
a  b  c  d  e a  b  c  d  e
6  8  10....etc

所以，只是试图在表格中保持更高级别的分组。对不起，如果一个简单的问题，我已经环顾四周，找不到任何东西。

Answer 1

如果处理大型数据集，可以更轻松，更快地使用dplyr包。但是，它仅适用于数据框。

d <- data.frame(x=x,y=y,z=z)

对于第一种情况：

groups <- group_by(d,z)
summarise(groups,sum(x))

  z sum(x)
1 f     55
2 g    155

对于第二种情况：

groups <- group_by(d,y)
summarise(groups,sum(x))

  y sum(x)
1 a     10
2 b     26
3 c     42
4 d     58
5 e     74

最后一个案例：

groups <- group_by(d,z,y)
summarise(groups,sum(x))

  z y sum(x)
1 f a     10
2 f b     26
3 f c     19
4 g c     23
5 g d     58
6 g e     74

Answer 2

这是我在我自己的数据上使用的代码

with(reduced, do.call(rbind, tapply(WR, list(period, no.C), 
                           function(x) c(WR = mean(x), SD = sd(x)))))

reduced = my data frame
WR is the variable I want to calculate the mean from
period is one of my grouping variables.  in this case its binary 
no.C is another grouping variable - here I have 3 groups

等式的其余部分是函数，但是如果你只需要一个值，那么只需要写出均值（或求和或其他任何统计量）就可以很容易地取代它，但我也想要它来计算标准差我将它绑定到一个小表中，我可以稍后用rbind打印。对不起，我没有把答案放在您的数据上下文中 - 但我对您究竟想要什么感到困惑。

基本上，在使用list时，您可以在使用tapply时开始创建任意数量的分组值。

您也可以使用aggregate执行类似操作 - 请参阅此快速web page，以获得整洁的答案和问题示例。

with(reduced, aggregate(WR, list(period, no.C), mean))

tapply具有多个组的功能

2 个答案: