使用apply函数的问题

时间:2015-01-21 19:07:33

标签: r

我对R中的apply函数没什么问题。 我有一个数据框 - “面包店”:

head(bakery)
  Day.of.Week White Wheat Multigrain Black Cinnamon.Raisin Sour.Dough.French Light.Oat
1           5   436   456        417   311              95                96       224
2           6   653   571        557   416             129               140       224
3           1   496   490        403   351             114               108       228
4           2   786   611        570   473             165               148       304
5           4   547   474        424   365             144               104       256
6           5   513   443        380   317             100                92       180

第一列是星期几编码,所有其他都显示不同种类的面包的数量,特别是在一天销售。我的任务是创建一个新的变量,相对于每周的每一天,这意味着(对于所有类型的面包)。 我是用这个命令做的:

x12 <- 0
for (i in 2:8) {
           x12<-x12+tapply(bakery[, i], bakery[, 1], mean)
           }
x12
#    1    2    4    5    6 
# 2190 3057 2314 2030 2690 

我可以使用applysapply函数执行相同操作吗?

4 个答案:

答案 0 :(得分:2)

因为您希望按星期几进行分组,所以tapply将是一个不错的选择。你可以做到

tapply(rowSums(bakery[,-1]), factor(bakery[,1]), mean)

因为在这种情况下,和的平均值应该与均值之和相同。测试并不容易,因为您的示例结果似乎与您的测试数据不匹配(有Day.of.week的行.7)

答案 1 :(得分:1)

此外:

rowsum(bakery[-1], bakery[[1]]) / table(bakery[[1]])
#  White Wheat Multigrain Black Cinnamon.Raisin Sour.Dough.French Light.Oat
#1 496.0 490.0      403.0   351           114.0               108       228
#2 786.0 611.0      570.0   473           165.0               148       304
#4 547.0 474.0      424.0   365           144.0               104       256
#5 474.5 449.5      398.5   314            97.5                94       202
#6 653.0 571.0      557.0   416           129.0               140       224

rowSums(rowsum(bakery[-1], bakery[[1]]) / table(bakery[[1]]))
#   1    2    4    5    6 
#2190 3057 2314 2030 2690

其中:

bakery = structure(list(Day.of.Week = c(5L, 6L, 1L, 2L, 4L, 5L), White = c(436L, 
653L, 496L, 786L, 547L, 513L), Wheat = c(456L, 571L, 490L, 611L, 
474L, 443L), Multigrain = c(417L, 557L, 403L, 570L, 424L, 380L
), Black = c(311L, 416L, 351L, 473L, 365L, 317L), Cinnamon.Raisin = c(95L, 
129L, 114L, 165L, 144L, 100L), Sour.Dough.French = c(96L, 140L, 
108L, 148L, 104L, 92L), Light.Oat = c(224L, 224L, 228L, 304L, 
256L, 180L)), .Names = c("Day.of.Week", "White", "Wheat", "Multigrain", 
"Black", "Cinnamon.Raisin", "Sour.Dough.French", "Light.Oat"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

答案 2 :(得分:0)

使用dplyr

bakery %>%
  group_by(Day.of.Week) %>%
  summarise_each(funs(mean))

  Day.of.Week White Wheat Multigrain Black Cinnamon.Raisin Sour.Dough.French Light.Oat
1           1 496.0 490.0      403.0   351           114.0               108       228
2           2 786.0 611.0      570.0   473           165.0               148       304
3           4 547.0 474.0      424.0   365           144.0               104       256
4           5 474.5 449.5      398.5   314            97.5                94       202
5           6 653.0 571.0      557.0   416           129.0               140       224

如果您正在寻找每天销售的总休息时间:

bakery %>%
  mutate(SumVar=rowSums(.[-1])) %>%
  group_by(Day.of.Week) %>%
  select(Day.of.Week,SumVar) %>%
  summarise_each(funs(mean))

  Day.of.Week SumVar
1           1   2190
2           2   3057
3           4   2314
4           5   2030
5           6   2690

FIXED ,以便rowSums不会在当天添加总和。

答案 3 :(得分:0)

基于data.table的解决方案:

library(data.table)

setDT(bakery)[,.(mean=mean(rowSums(.SD))),by=Day.of.Week]

#    Day.of.Week mean
# 1:           5 2030
# 2:           6 2690
# 3:           1 2190
# 4:           2 3057
# 5:           4 2314