我对R中的apply
函数没什么问题。
我有一个数据框 - “面包店”:
head(bakery)
Day.of.Week White Wheat Multigrain Black Cinnamon.Raisin Sour.Dough.French Light.Oat
1 5 436 456 417 311 95 96 224
2 6 653 571 557 416 129 140 224
3 1 496 490 403 351 114 108 228
4 2 786 611 570 473 165 148 304
5 4 547 474 424 365 144 104 256
6 5 513 443 380 317 100 92 180
第一列是星期几编码,所有其他都显示不同种类的面包的数量,特别是在一天销售。我的任务是创建一个新的变量,相对于每周的每一天,这意味着(对于所有类型的面包)。 我是用这个命令做的:
x12 <- 0
for (i in 2:8) {
x12<-x12+tapply(bakery[, i], bakery[, 1], mean)
}
x12
# 1 2 4 5 6
# 2190 3057 2314 2030 2690
我可以使用apply
或sapply
函数执行相同操作吗?
答案 0 :(得分:2)
因为您希望按星期几进行分组,所以tapply
将是一个不错的选择。你可以做到
tapply(rowSums(bakery[,-1]), factor(bakery[,1]), mean)
因为在这种情况下,和的平均值应该与均值之和相同。测试并不容易,因为您的示例结果似乎与您的测试数据不匹配(有Day.of.week的行.7)
答案 1 :(得分:1)
此外:
rowsum(bakery[-1], bakery[[1]]) / table(bakery[[1]])
# White Wheat Multigrain Black Cinnamon.Raisin Sour.Dough.French Light.Oat
#1 496.0 490.0 403.0 351 114.0 108 228
#2 786.0 611.0 570.0 473 165.0 148 304
#4 547.0 474.0 424.0 365 144.0 104 256
#5 474.5 449.5 398.5 314 97.5 94 202
#6 653.0 571.0 557.0 416 129.0 140 224
rowSums(rowsum(bakery[-1], bakery[[1]]) / table(bakery[[1]]))
# 1 2 4 5 6
#2190 3057 2314 2030 2690
其中:
bakery = structure(list(Day.of.Week = c(5L, 6L, 1L, 2L, 4L, 5L), White = c(436L,
653L, 496L, 786L, 547L, 513L), Wheat = c(456L, 571L, 490L, 611L,
474L, 443L), Multigrain = c(417L, 557L, 403L, 570L, 424L, 380L
), Black = c(311L, 416L, 351L, 473L, 365L, 317L), Cinnamon.Raisin = c(95L,
129L, 114L, 165L, 144L, 100L), Sour.Dough.French = c(96L, 140L,
108L, 148L, 104L, 92L), Light.Oat = c(224L, 224L, 228L, 304L,
256L, 180L)), .Names = c("Day.of.Week", "White", "Wheat", "Multigrain",
"Black", "Cinnamon.Raisin", "Sour.Dough.French", "Light.Oat"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
答案 2 :(得分:0)
使用dplyr
bakery %>%
group_by(Day.of.Week) %>%
summarise_each(funs(mean))
Day.of.Week White Wheat Multigrain Black Cinnamon.Raisin Sour.Dough.French Light.Oat
1 1 496.0 490.0 403.0 351 114.0 108 228
2 2 786.0 611.0 570.0 473 165.0 148 304
3 4 547.0 474.0 424.0 365 144.0 104 256
4 5 474.5 449.5 398.5 314 97.5 94 202
5 6 653.0 571.0 557.0 416 129.0 140 224
如果您正在寻找每天销售的总休息时间:
bakery %>%
mutate(SumVar=rowSums(.[-1])) %>%
group_by(Day.of.Week) %>%
select(Day.of.Week,SumVar) %>%
summarise_each(funs(mean))
Day.of.Week SumVar
1 1 2190
2 2 3057
3 4 2314
4 5 2030
5 6 2690
FIXED ,以便rowSums不会在当天添加总和。
答案 3 :(得分:0)
基于data.table
的解决方案:
library(data.table)
setDT(bakery)[,.(mean=mean(rowSums(.SD))),by=Day.of.Week]
# Day.of.Week mean
# 1: 5 2030
# 2: 6 2690
# 3: 1 2190
# 4: 2 3057
# 5: 4 2314