Question

在我在这里提出这个问题之前，我已经进行了足够的研究，但是您能为我提供一些有关此问题的想法吗？

我的数据表（df）如下：

client id   value    repmonth
123          100     2012-01-31
123          200     2012-02-31
123          300     2012-05-31

因此，我有2个失踪月份。我希望我的数据表看起来像这样：

client id   value    repmonth
123          100     2012-01-31
123          200     2012-02-31
123          200     2012-03-31
123          200     2012-04-31
123          300     2012-05-31

该代码应填写缺少的repmonth并用最后一个值（在本例中为200和客户ID）填充行。

我尝试了以下方法：

zoo library 
tidyr library 
dlpyr library 
posixct

至于代码：...大量失败

library(tidyr)
df %>%
  mutate (repmonth = as.Date(repmonth)) %>%
  complete(repmonth = seq.Date(min(repmonth), max(repmonth),by ="month"))

或

library(dplyr)

df$reportingDate.end.month <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df)   

list_df <- list(df, df)    # fake list of data.frames
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2012-01-31"), 
                                           as.POSIXct("2018-12-31"), 
                                           by="month"))

lapply(list_df, function(x){full_join(total_loan_portfolios_3$reportingDate.end.month, seq_df, by=reportingDate.end.month)})

total_loan_portfolios_3$reportingmonth_notmissing <- full_join(seq_df,total_loan_portfolios_3$reportingDate.end.month)

或

library(dplyr)

ts <- seq.POSIXt(as.POSIXct("2012-01-01",'%d/%m/%Y'), as.POSIXct("2018/12/01",'%d/%m/%Y'), by="month")

ts <- seq.POSIXt(as.POSIXlt("2012-01-01"), as.POSIXlt("2018-12-01"), by="month")
ts <- format.POSIXct(ts,'%d/%m/%Y')

df <- data.frame(timestamp=ts)

total_loan_portfolios_3 <- full_join(df,total_loan_portfolios_3$Reporting_date)

最后，我有很多错误

格式不是日期

或

seq.int（r1 $ mon，12 *（to0 $ year-r1 $ year）+ to0 $ mon，by）中的错误：
“来自”必须为有限数字

和其他。

Answer 1

以下解决方案使用lubridate和tidyr软件包。请注意，在OP示例中，日期格式不正确，但意味着具有月末一天输入的数据，因此请尝试在此处复制。解决方案创建从最小输入日期到最大输入日期的日期序列，以获取所有可能的月份。请注意，输入日期已标准化为每月的第一天，以确保正确生成序列。创建序列后，将进行左联接合并以合并我们拥有的数据并识别丢失的数据。然后将fill（）应用于列以填充缺少的NA。

library(lubridate)
library(tidyr)
#Note OP has month of Feb with 31 days... Corrected to 28 but this fails to parse as a date
df <- data.frame(client_id=c(123,123,123),value=c(100,200,300),repmonth=c("2012-01-31","2012-02-29","2012-05-31"),stringsAsFactors = F)

df$repmonth <- ymd(df$repmonth) #convert character dates to Dates
start_month <- min(df$repmonth)
start_month <- start_month - days(day(start_month)-1) #first day of month to so seq.Date sequences properly

all_dates <- seq.Date(from=start_month,to=max(df$repmonth),by="1 month")
all_dates <- (all_dates %m+% months(1)) - days(1) #all end-of-month-day since OP suggests having last-day-of-month input?
all_dates <- data.frame(repmonth=all_dates)
df<-merge(x=all_dates,y=df,by="repmonth",all.x=T)

df <- fill(df,c("client_id","value"))

解决方案产量：

> df
    repmonth client_id value
1 2012-01-31       123   100
2 2012-02-29       123   200
3 2012-03-31       123   200
4 2012-04-30       123   200
5 2012-05-31       123   300

填写缺少的日期并填写上面的数据

1 个答案: