使先前的question复杂化,假设我有以下袜子数据。
>socks
year drawer week sock_total
1990 1 1 3
1990 1 2 4
1990 1 3 3
1990 1 4 2
1990 1 5 4
1990 2 1 1
1990 2 2 1
1990 2 3 1
1990 2 4 1
1990 2 5 2
1990 3 1 3
1990 3 2 4
1990 3 3 4
1990 3 4 4
1990 3 5 4
1991 1 1 4
1991 1 2 3
1991 1 3 2
1991 1 4 2
1991 1 5 3
1991 2 1 1
1991 2 2 3
1991 2 3 4
1991 2 4 4
1991 2 5 3
1991 3 1 2
1991 3 2 3
1991 3 3 3
1991 3 4 2
1991 3 5 3
如何在summarise
中使用dplyr
来创建新变量
growth
等于1
,如果它们在第一年和第二年之间每周都有增加,则为0
。数据应如下所示
>socks
drawer week growth
1 1 1
1 2 0
1 3 0
1 4 0
1 5 0
2 1 0
2 2 1
2 3 1
2 4 1
2 5 1
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
另外,如何处理抽屉在其中一年中没有相应周的数据。如果缺少一周,也请添加NA
。
答案 0 :(得分:1)
答案与之前的答案非常相似,但drawer
和week
分组,@ eipi10的评论也是一个很好的选择;您可以使用drawer
之后的索引来处理特定week
和subset
的缺失年份,这会将长度为零的对象转换为NA:
例如:
df %>%
group_by(drawer, week) %>%
summarise(growth = +(sock_total[year==1991][1] - sock_total[year==1990][1] > 0))
# ^^^ ^^^
# A tibble: 15 x 3
# Groups: drawer [?]
# drawer week growth
# <int> <int> <int>
# 1 1 1 1
# 2 1 2 0
# 3 1 3 0
# 4 1 4 0
# 5 1 5 0
# 6 2 1 0
# 7 2 2 1
# 8 2 3 1
# 9 2 4 1
#10 2 5 1
#11 3 1 0
#12 3 2 0
#13 3 3 0
#14 3 4 0
#15 3 5 NA
数据遗漏了1991年抽屉3和第5周:
structure(list(year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L,
1991L, 1991L, 1991L, 1991L, 1991L), drawer = c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), week = c(1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), sock_total = c(3L, 4L, 3L,
2L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 2L, 2L,
3L, 1L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 2L)), .Names = c("year",
"drawer", "week", "sock_total"), class = "data.frame", row.names = c(NA,
-29L))
答案 1 :(得分:1)
或者您可以在没有complete
的情况下尝试此操作。
df%>%group_by(drawer,week)%>%
summarise(growth =ifelse(n()<=1,0,ifelse((sock_total[1]-sock_total[2])>=0,0,1)))
# A tibble: 15 x 3
# Groups: drawer [?]
drawer week growth
<int> <int> <dbl>
1 1 1 1
2 1 2 0
3 1 3 0
4 1 4 0
5 1 5 0
6 2 1 0
7 2 2 1
8 2 3 1
9 2 4 1
10 2 5 1
11 3 1 0
12 3 2 0
13 3 3 0
14 3 4 0
15 3 5 0