考虑此“ for循环”
alpha <- data.frame()
for(i in 1:30)
{
nam <- paste("d", i, sep = "")
assign(nam, filter(a1,day(date)==i))
nam <- aggregate(steps~group,nam,sum()) #I want to access d[i] through variable "nam" which is showing error
alpha <- rbind(alpha,nam)
}
在for循环的每个迭代中,我要过滤“天”(从1到30),并使用聚合函数根据列组进行分组,最后重新绑定每个迭代以创建新的数据框架alpha
但这会在for循环内的第3行出现此错误
Error in eval(predvars, data, env) :
invalid 'envir' argument of type 'character'
我的数据框“ a1”
tibble: 8,640 x 5
steps date interval interval.1 group
<dbl> <fct> <int> <dttm> <fct>
1 0 2012-11-01 0 2012-11-01 00:00:00 0
2 0 2012-11-01 5 2012-11-01 00:05:00 0
3 0 2012-11-01 10 2012-11-01 00:10:00 0
4 0 2012-11-01 15 2012-11-01 00:15:00 0
5 0 2012-11-01 20 2012-11-01 00:20:00 0
6 0 2012-11-01 25 2012-11-01 00:25:00 0
7 0 2012-11-01 30 2012-11-01 00:30:00 0
8 0 2012-11-01 35 2012-11-01 00:35:00 0
9 0 2012-11-01 40 2012-11-01 00:40:00 0
10 0 2012-11-01 45 2012-11-01 00:45:00 0
# ... with 8,630 more rows
请向我解释解决此问题的方法?达到我想要的输出的任何答案就足够了
编辑-1
dput(head(a1,10))=
structure(list(steps = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), date = structure(c(32L,
32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L), .Label = c("2012-10-01",
"2012-10-02", "2012-10-03", "2012-10-04", "2012-10-05", "2012-10-06",
"2012-10-07", "2012-10-08", "2012-10-09", "2012-10-10", "2012-10-11",
"2012-10-12", "2012-10-13", "2012-10-14", "2012-10-15", "2012-10-16",
"2012-10-17", "2012-10-18", "2012-10-19", "2012-10-20", "2012-10-21",
"2012-10-22", "2012-10-23", "2012-10-24", "2012-10-25", "2012-10-26",
"2012-10-27", "2012-10-28", "2012-10-29", "2012-10-30", "2012-10-31",
"2012-11-01", "2012-11-02", "2012-11-03", "2012-11-04", "2012-11-05",
"2012-11-06", "2012-11-07", "2012-11-08", "2012-11-09", "2012-11-10",
"2012-11-11", "2012-11-12", "2012-11-13", "2012-11-14", "2012-11-15",
"2012-11-16", "2012-11-17", "2012-11-18", "2012-11-19", "2012-11-20",
"2012-11-21", "2012-11-22", "2012-11-23", "2012-11-24", "2012-11-25",
"2012-11-26", "2012-11-27", "2012-11-28", "2012-11-29", "2012-11-30"
), class = "factor"), interval = c(0L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L), interval.1 = structure(c(1351708200, 1351708500,
1351708800, 1351709100, 1351709400, 1351709700, 1351710000, 1351710300,
1351710600, 1351710900), class = c("POSIXct", "POSIXt"), tzone = ""),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("0", "100", "200", "300", "400", "500", "600",
"700", "800", "900", "1000", "1100", "1200", "1300", "1400",
"1500", "1600", "1700", "1800", "1900", "2000", "2100", "2200",
"2300"), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
答案 0 :(得分:2)
由于无论如何都在使用dplyr,因此可以使用summarise
代替aggregate
,这样可以简化很多事情。给定这样的数据框(请注意,我省略了一些不相关的变量):
# A tibble: 30 x 3
steps interval group
<int> <dttm> <int>
1 1 2012-11-01 00:00:00 1
2 4 2012-11-01 00:05:00 1
3 4 2012-11-01 00:10:00 1
4 5 2012-11-01 00:15:00 1
5 6 2012-11-01 00:20:00 1
6 6 2012-11-01 00:25:00 2
7 6 2012-11-01 00:30:00 2
8 7 2012-11-01 00:35:00 2
9 9 2012-11-01 00:40:00 2
10 10 2012-11-01 00:45:00 2
# … with 20 more rows
执行以下操作,将其按date
和group
分组,然后为每个计算摘要(在这种情况下为steps
的总和):
df %>%
group_by(date = date(interval), group) %>%
summarize(sum = sum(steps))
将产生以下内容:
# A tibble: 6 x 3
# Groups: date [3]
date group sum
<date> <int> <int>
1 2012-11-01 1 20
2 2012-11-01 2 38
3 2012-11-02 1 14
4 2012-11-02 2 42
5 2012-11-03 1 12
6 2012-11-03 2 38
这里的主要好处是清晰明了,而且您可以计算组总和而不必随后堆叠数据帧。另外,如果您想坚持以R为基数,也可以使用aggregate(steps ~ group + date(interval), df, sum)
或aggregate(df$steps, by = list(group = df$group, date = date(df$interval)), sum)
之类的东西,在这种情况下,这也是非常简洁的选择。
df <- structure(list(steps = c(1L, 4L, 4L, 5L, 6L, 6L, 6L, 7L, 9L,
10L, 1L, 2L, 3L, 3L, 5L, 7L, 8L, 8L, 9L, 10L, 1L, 2L, 2L, 3L,
4L, 6L, 6L, 7L, 9L, 10L), interval = structure(c(1351728000,
1351728300, 1351728600, 1351728900, 1351729200, 1351729500, 1351729800,
1351730100, 1351730400, 1351730700, 1351814400, 1351814700, 1351815000,
1351815300, 1351815600, 1351815900, 1351816200, 1351816500, 1351816800,
1351817100, 1351900800, 1351901100, 1351901400, 1351901700, 1351902000,
1351902300, 1351902600, 1351902900, 1351903200, 1351903500), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA, -30L), class = c("tbl_df",
"tbl", "data.frame"))
答案 1 :(得分:1)
请尝试根据date
划分数据,然后根据aggregate
为每个组划分数据。
lst1 <- lapply(split(a1, a1$date), function(x) aggregate(steps~group,x,sum))
这应该为您提供sum
到steps
的{{1}}的每个日期的数据帧列表。您可以通过执行group
,lst1[[1]]
来访问单个数据帧。
要在一个数据帧中获取输出,我们可以使用lst1[[2]]
do.call