人们正在购买东西,我有一个人上次用邮政编码购买该商品的日期。我想获取该组中的最后一个非同期日期。
ZCTA5 = c("b", "c", "a", "b", "b", "c", "a", "a", "a", "c")
App.Complete.Date = c("2005-01-23", "2005-01-23",
"2006-07-13", "2006-11-21",
"2006-11-21", "2006-11-21",
"2007-01-01", "2007-01-01",
"2007-01-01", "2007-01-01")
xxx <- data.frame(ZCTA5,App.Complete.Date) %>%
arrange(ZCTA5,App.Complete.Date); xxx
Last.Unique.Date.In.ZCTA5 =c(NA, "2006-07-13", "2006-07-13", "2006-07-13", NA, "2005-01-23",
"2005-01-23", NA, "2005-01-23", "2006-11-21")
所需的输出
ZCTA5 App.Complete.Date Last.Unique.Date.In.ZCTA5
1 a 2006-07-13 <NA>
2 a 2007-01-01 2006-07-13
3 a 2007-01-01 2006-07-13
4 a 2007-01-01 2006-07-13
5 b 2005-01-23 <NA>
6 b 2006-11-21 2005-01-23
7 b 2006-11-21 2005-01-23
8 c 2005-01-23 <NA>
9 c 2006-11-21 2005-01-23
10 c 2007-01-01 2006-11-21
我不想放弃任何意见。进行适当的突变比较理想,但是我知道通过ZCTA5加入(以后没有显示,但是我确实有)个人ID以后就可以了。
我无法通过滞后于唯一的App.Complete.Date值找出一种方法来对新变量进行变异,因此我陷入了困境。另外,切片太麻烦了,因为我仍然需要最后一个日期而不删除同时期的日期。
编辑:如果NA是同一行的App.Complete.Date,则可以接受。
答案 0 :(得分:1)
尝试以下操作:
xxx = xxx %>%
mutate(App.Complete.Date = as.Date(App.Complete.Date),
rn = row_number())
用于确保日期列为日期类型的初始设置。添加行号以保留原始的重复日期。
yyy = xxx %>%
left_join(xxx, by = "ZCTA5") %>%
# discard all the out-of-scope dates
mutate(App.Complete.Date.y = ifelse(App.Complete.Date.y < App.Complete.Date.x,
App.Complete.Date.y, NA)) %>%
# we need to include row number here to preserve all rows in the original
group_by(ZCTA5, App.Complete.Date.x, rn.x) %>%
# na.rm = TRUE handles all the missing values removed in the previous mutate
summarise(App.Complete.Date.y = max(App.Complete.Date.y, na.rm = TRUE), .groups = 'drop') %>%
# summarise may return numeric type rather than date type - convert back
mutate(App.Complete.Date.y = as.Date(App.Complete.Date.y, origin = "1970-01-01")) %>%
# rename to output
select(ZCTA5,
App.Complete.Date = App.Complete.Date.x,
Last.Unique.Date.In.ZCTA5 = App.Complete.Date.y)
您可能需要在最后一个突变中更改origin
参数,具体取决于系统中的基准日期。当我的计算机返回13342而不是“ 2006-07-13”时,我确定基准日期为“ 1970-01-01”,因为“ 2006-07-13”是“ 1970-01-01”之后的13342天。 >