Question

我的数据是这样的：

df <- data.frame(Id=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,9,9,9,9),Date=c("2013-04","2013-12","2013-01","2013-12","2013-11",
             "2013-12","2012-04","2013-12","2012-08","2014-12","2013-08","2014-12","2013-08","2014-12","2011-01","2013-11","2013-12","2014-01","2014-04"))

要获得正确的格式：

df$Date <- paste0(df$Date,"-01")

我只需要获取years，这样每个id就会包含2个日期。

我是否对现有数据这样做：

require(lubridate)
df$Date <- year(as.Date(df$Date)-days(1))

对于给定的id，我有时会得到相同的日期。

列Date的所需输出是：

 2012 2013 2012 2013 2012 2013 2012 2013 2013 2014 2013 2014 2013 2014 2011 2013 2014

请注意，给定id的最后日期始终正确，因此必须根据上一日期更正前一年。日期必须采用可以转换为年份的格式，如图所示。

编辑以下是这种情况：

Id Date 
1 2013-11-01    
1 2013-12-01     
1 2014-01-01    
1 2014-04-01

现在我收到了这个：2012,2013,2013,2013

我需要：2012,2013,2013,2014

Answer 1

这就是我使用data.table包来解决这个问题的方法（虽然看起来对我来说太复杂了）

library(data.table)
setDT(df)[, year := year(Date)][, 
            year := if(.N == 2) (year[2] - 1):year[2] else year,
            Id][]    

#     Id       Date year indx
#  1:  1 2013-04-01 2012    2
#  2:  1 2013-12-01 2013    2
#  3:  2 2013-01-01 2012    2
#  4:  2 2013-12-01 2013    2
#  5:  3 2013-11-01 2012    2
#  6:  3 2013-12-01 2013    2
#  7:  4 2012-04-01 2012    2
#  8:  4 2013-12-01 2013    2
#  9:  5 2012-08-01 2013    2
# 10:  5 2014-12-01 2014    2
# 11:  6 2013-08-01 2013    2
# 12:  6 2014-12-01 2014    2
# 13:  7 2013-08-01 2013    2
# 14:  7 2014-12-01 2014    2
# 15:  8 2011-01-01 2011    1

或者一步到位（感谢@Arun提供此功能）：

setDT(df)[, year := {tmp = year(Date); 
            if (.N == 2L) (tmp[2]-1L):tmp[2] else tmp},
            Id]

修改：根据OP的新数据，我们可以通过添加额外的索引来修改代码

setDT(df)[, indx := if(.N > 2) rep(seq_len(.N/2), each = 2) + 1L else .N, Id] df[, year := {tmp = year(Date); if (.N > 1L) (tmp[2] - 1L):tmp[2] else tmp}, list(Id, indx)][] # Id Date indx year # 1: 1 2013-04-01 2 2012 # 2: 1 2013-12-01 2 2013 # 3: 2 2013-01-01 2 2012 # 4: 2 2013-12-01 2 2013 # 5: 3 2013-11-01 2 2012 # 6: 3 2013-12-01 2 2013 # 7: 4 2012-04-01 2 2012 # 8: 4 2013-12-01 2 2013 # 9: 5 2012-08-01 2 2013 # 10: 5 2014-12-01 2 2014 # 11: 6 2013-08-01 2 2013 # 12: 6 2014-12-01 2 2014 # 13: 7 2013-08-01 2 2013 # 14: 7 2014-12-01 2 2014 # 15: 8 2011-01-01 1 2011 # 16: 9 2013-11-01 2 2012 # 17: 9 2013-12-01 2 2013 # 18: 9 2014-01-01 3 2013 # 19: 9 2014-04-01 3 2014

或@akrun提供的另一种可能的解决方案

setDT(df)[, `:=`(year = year(Date), indx = .N, indx2 = as.numeric(gl(.N,2, .N))), Id] df[indx > 1, year:=(year[2]-1):year[2], list(Id, indx2)][]

Answer 2

使用{@ 1}}使用与@David Arenburg相似的方法

dplyr

或使用library(dplyr) df %>% group_by(Id) %>% mutate(year=as.numeric(sub('-.*', '', Date)), year=replace(year, n()>1, c(year[2]-1, year[2]))) # Id Date year #1 1 2013-04 2012 #2 1 2013-12 2013 #3 2 2013-01 2012 #4 2 2013-12 2013 #5 3 2013-11 2012 #6 3 2013-12 2013 #7 4 2012-04 2012 #8 4 2013-12 2013 #9 5 2012-08 2013 #10 5 2014-12 2014 #11 6 2013-08 2013 #12 6 2014-12 2014 #13 7 2013-08 2013 #14 7 2014-12 2014 #15 8 2011-01 2011

base R

更新

你可以尝试

with(df, ave(as.numeric(sub('-.*', '', Date)), Id, 
     FUN=function(x) if(length(x)>1)(x[2]-1):x[2] else x))

#[1] 2012 2013 2012 2013 2012 2013 2012 2013 2013 2014 2013 2014 2013 2014 2011

或者

df$indx <- with(df, ave(Id, Id, FUN=function(x) (seq_along(x)-1)%/%2+1))

with(df, ave(as.numeric(sub('-.*', '', Date)), Id, indx, 
         FUN=function(x) if(length(x)>1)(x[2]-1):x[2] else x)) 
#[1] 2012 2013 2012 2013 2012 2013 2012 2013 2013 2014 2013 2014 2013 2014 2011
#[16] 2012 2013 2013 2014

Answer 3

这是一个dplyr解决方案。您可以删除中间字段last_year和year2，但为了清楚起见，我将其留在此处：

library(stringr)
library(dplyr)

df %>%
  group_by(Id) %>%
  mutate(
    last_year = last(as.integer(str_sub(Date, 1, 4))),
    year2 = row_number() - n(),
    year = last_year + year2
    )

通过在R中的id纠正前一年

3 个答案:

更新