按周dplyr的条件和分组摘要

时间:2017-07-30 00:18:37

标签: r dplyr group-summaries

使先前的question复杂化,假设我有以下袜子数据。

>socks
year drawer  week  sock_total
1990  1       1        3                    
1990  1       2        4
1990  1       3        3 
1990  1       4        2 
1990  1       5        4
1990  2       1        1           
1990  2       2        1
1990  2       3        1
1990  2       4        1 
1990  2       5        2
1990  3       1        3
1990  3       2        4 
1990  3       3        4
1990  3       4        4
1990  3       5        4
1991  1       1        4
1991  1       2        3
1991  1       3        2
1991  1       4        2 
1991  1       5        3
1991  2       1        1           
1991  2       2        3
1991  2       3        4
1991  2       4        4
1991  2       5        3
1991  3       1        2           
1991  3       2        3
1991  3       3        3
1991  3       4        2
1991  3       5        3

如何在summarise中使用dplyr来创建新变量 growth等于1,如果它们在第一年和第二年之间每周都有增加,则为0。数据应如下所示

>socks
 drawer  week growth 
  1       1        1        
  1       2        0   
  1       3        0   
  1       4        0   
  1       5        0   
  2       1        0        
  2       2        1   
  2       3        1   
  2       4        1   
  2       5        1   
  3       1        0   
  3       2        0   
  3       3        0   
  3       4        0   
  3       5        0

另外,如何处理抽屉在其中一年中没有相应周的数据。如果缺少一周,也请添加NA

2 个答案:

答案 0 :(得分:1)

答案与之前的答案非常相似,但drawerweek分组,@ eipi10的评论也是一个很好的选择;您可以使用drawer之后的索引来处理特定weeksubset的缺失年份,这会将长度为零的对象转换为NA:

例如:

df %>% 
    group_by(drawer, week) %>% 
    summarise(growth = +(sock_total[year==1991][1] - sock_total[year==1990][1] > 0))
#                                              ^^^                         ^^^
# A tibble: 15 x 3
# Groups:   drawer [?]
#   drawer  week growth
#    <int> <int>  <int>
# 1      1     1      1
# 2      1     2      0
# 3      1     3      0
# 4      1     4      0
# 5      1     5      0
# 6      2     1      0
# 7      2     2      1
# 8      2     3      1
# 9      2     4      1
#10      2     5      1
#11      3     1      0
#12      3     2      0
#13      3     3      0
#14      3     4      0
#15      3     5     NA

数据遗漏了1991年抽屉3和第5周:

structure(list(year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 
1991L, 1991L, 1991L, 1991L, 1991L), drawer = c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), week = c(1L, 2L, 3L, 4L, 
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), sock_total = c(3L, 4L, 3L, 
2L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 2L, 2L, 
3L, 1L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 2L)), .Names = c("year", 
"drawer", "week", "sock_total"), class = "data.frame", row.names = c(NA, 
-29L))

答案 1 :(得分:1)

或者您可以在没有complete的情况下尝试此操作。

df%>%group_by(drawer,week)%>%
     summarise(growth =ifelse(n()<=1,0,ifelse((sock_total[1]-sock_total[2])>=0,0,1)))



# A tibble: 15 x 3
# Groups:   drawer [?]
   drawer  week growth
    <int> <int>  <dbl>
 1      1     1      1
 2      1     2      0
 3      1     3      0
 4      1     4      0
 5      1     5      0
 6      2     1      0
 7      2     2      1
 8      2     3      1
 9      2     4      1
10      2     5      1
11      3     1      0
12      3     2      0
13      3     3      0
14      3     4      0
15      3     5      0