无法访问for循环内的特定变量

时间:2019-06-26 07:30:58

标签: r dataframe for-loop aggregate assign

考虑此“ for循环”

alpha <- data.frame()

for(i in 1:30)
{
 nam <- paste("d", i, sep = "")
 assign(nam,  filter(a1,day(date)==i))
 nam <- aggregate(steps~group,nam,sum()) #I want to access d[i] through variable "nam" which is showing error
alpha <- rbind(alpha,nam) 
}

在for循环的每个迭代中,我要过滤“天”(从1到30),并使用聚合函数根据列组进行分组,最后重新绑定每个迭代以创建新的数据框架alpha

但这会在for循环内的第3行出现此错误

Error in eval(predvars, data, env) : 
  invalid 'envir' argument of type 'character'

我的数据框“ a1”

 tibble: 8,640 x 5
   steps date       interval interval.1          group
   <dbl> <fct>         <int> <dttm>              <fct>
 1     0 2012-11-01        0 2012-11-01 00:00:00 0    
 2     0 2012-11-01        5 2012-11-01 00:05:00 0    
 3     0 2012-11-01       10 2012-11-01 00:10:00 0    
 4     0 2012-11-01       15 2012-11-01 00:15:00 0    
 5     0 2012-11-01       20 2012-11-01 00:20:00 0    
 6     0 2012-11-01       25 2012-11-01 00:25:00 0    
 7     0 2012-11-01       30 2012-11-01 00:30:00 0    
 8     0 2012-11-01       35 2012-11-01 00:35:00 0    
 9     0 2012-11-01       40 2012-11-01 00:40:00 0    
10     0 2012-11-01       45 2012-11-01 00:45:00 0    
# ... with 8,630 more rows

请向我解释解决此问题的方法?达到我想要的输出的任何答案就足够了

编辑-1

dput(head(a1,10))=


structure(list(steps = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), date = structure(c(32L, 
32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L), .Label = c("2012-10-01", 
"2012-10-02", "2012-10-03", "2012-10-04", "2012-10-05", "2012-10-06", 
"2012-10-07", "2012-10-08", "2012-10-09", "2012-10-10", "2012-10-11", 
"2012-10-12", "2012-10-13", "2012-10-14", "2012-10-15", "2012-10-16", 
"2012-10-17", "2012-10-18", "2012-10-19", "2012-10-20", "2012-10-21", 
"2012-10-22", "2012-10-23", "2012-10-24", "2012-10-25", "2012-10-26", 
"2012-10-27", "2012-10-28", "2012-10-29", "2012-10-30", "2012-10-31", 
"2012-11-01", "2012-11-02", "2012-11-03", "2012-11-04", "2012-11-05", 
"2012-11-06", "2012-11-07", "2012-11-08", "2012-11-09", "2012-11-10", 
"2012-11-11", "2012-11-12", "2012-11-13", "2012-11-14", "2012-11-15", 
"2012-11-16", "2012-11-17", "2012-11-18", "2012-11-19", "2012-11-20", 
"2012-11-21", "2012-11-22", "2012-11-23", "2012-11-24", "2012-11-25", 
"2012-11-26", "2012-11-27", "2012-11-28", "2012-11-29", "2012-11-30"
), class = "factor"), interval = c(0L, 5L, 10L, 15L, 20L, 25L, 
30L, 35L, 40L, 45L), interval.1 = structure(c(1351708200, 1351708500, 
1351708800, 1351709100, 1351709400, 1351709700, 1351710000, 1351710300, 
1351710600, 1351710900), class = c("POSIXct", "POSIXt"), tzone = ""), 
    group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = c("0", "100", "200", "300", "400", "500", "600", 
    "700", "800", "900", "1000", "1100", "1200", "1300", "1400", 
    "1500", "1600", "1700", "1800", "1900", "2000", "2100", "2200", 
    "2300"), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

2 个答案:

答案 0 :(得分:2)

由于无论如何都在使用dplyr,因此可以使用summarise代替aggregate,这样可以简化很多事情。给定这样的数据框(请注意,我省略了一些不相关的变量):

# A tibble: 30 x 3
   steps interval            group
   <int> <dttm>              <int>
 1     1 2012-11-01 00:00:00     1
 2     4 2012-11-01 00:05:00     1
 3     4 2012-11-01 00:10:00     1
 4     5 2012-11-01 00:15:00     1
 5     6 2012-11-01 00:20:00     1
 6     6 2012-11-01 00:25:00     2
 7     6 2012-11-01 00:30:00     2
 8     7 2012-11-01 00:35:00     2
 9     9 2012-11-01 00:40:00     2
10    10 2012-11-01 00:45:00     2
# … with 20 more rows

执行以下操作,将其按dategroup分组,然后为每个计算摘要(在这种情况下为steps的总和):

df %>% 
    group_by(date = date(interval), group) %>% 
    summarize(sum = sum(steps))

将产生以下内容:

# A tibble: 6 x 3
# Groups:   date [3]
  date       group   sum
  <date>     <int> <int>
1 2012-11-01     1    20
2 2012-11-01     2    38
3 2012-11-02     1    14
4 2012-11-02     2    42
5 2012-11-03     1    12
6 2012-11-03     2    38

这里的主要好处是清晰明了,而且您可以计算组总和而不必随后堆叠数据帧。另外,如果您想坚持以R为基数,也可以使用aggregate(steps ~ group + date(interval), df, sum)aggregate(df$steps, by = list(group = df$group, date = date(df$interval)), sum)之类的东西,在这种情况下,这也是非常简洁的选择。

数据:

df <- structure(list(steps = c(1L, 4L, 4L, 5L, 6L, 6L, 6L, 7L, 9L, 
10L, 1L, 2L, 3L, 3L, 5L, 7L, 8L, 8L, 9L, 10L, 1L, 2L, 2L, 3L, 
4L, 6L, 6L, 7L, 9L, 10L), interval = structure(c(1351728000, 
1351728300, 1351728600, 1351728900, 1351729200, 1351729500, 1351729800, 
1351730100, 1351730400, 1351730700, 1351814400, 1351814700, 1351815000, 
1351815300, 1351815600, 1351815900, 1351816200, 1351816500, 1351816800, 
1351817100, 1351900800, 1351901100, 1351901400, 1351901700, 1351902000, 
1351902300, 1351902600, 1351902900, 1351903200, 1351903500), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

答案 1 :(得分:1)

请尝试根据date划分数据,然后根据aggregate为每个组划分数据。

lst1 <- lapply(split(a1, a1$date), function(x) aggregate(steps~group,x,sum))

这应该为您提供sumsteps的{​​{1}}的每个日期的数据帧列表。您可以通过执行grouplst1[[1]]来访问单个数据帧。


要在一个数据帧中获取输出,我们可以使用lst1[[2]]

do.call