数据:
DB1 <- data.frame(orderItemID = 1:10,
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))
预期结果:
DB1 <- data.frame(orderItemID = 1:10,
orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))
我的问题与我发布的另一个问题类似:所以不要混淆。 如您所见,我在交货日期有一些缺失值,我想用另一个日期替换它们。该日期应为特定项目的订单日期+(完整)天的平均交货时间。(2天) 平均交货时间是根据不包含缺失值的所有样品的平均值计算的时间=(2天+ 1天+ 3天+ 2天+ 1天+ 2天+ 1天+ 2天):8 = 1,75
所以我想用订单日期+ 2天替换交货时间的NA。如果没有NA,则日期应保持不变。
我已经尝试过了(使用lubridate
),但它不能正常工作:(
DB1$deliveryDate[is.na(DB1$deliveryDate) ] <- DB1$orderDate + days(2)
有人可以帮助我吗?
答案 0 :(得分:4)
首先,将列转换为Date
个对象:
DB1[,2:3]<-lapply(DB1[,2:3],as.Date)
然后,替换NA
元素:
DB1$deliveryDate[is.na(DB1$deliveryDate)] <-
DB1$orderDate[is.na(DB1$deliveryDate)] +
mean(difftime(DB1$orderDate,DB1$deliveryDate,units="days"),na.rm=TRUE)
# orderItemID orderDate deliveryDate
#1 1 2013-01-21 2013-01-23
#2 2 2013-03-31 2013-03-01
#3 3 2013-04-12 2013-04-14
#4 4 2013-06-01 2013-06-04
#5 5 2014-01-01 2014-01-03
#6 6 2014-02-19 2014-02-21
#7 7 2014-02-27 2014-02-28
#8 8 2014-10-02 2014-10-04
#9 9 2014-10-31 2014-11-01
#10 10 2014-11-21 2014-11-23
答案 1 :(得分:3)
你可以这样做:
DB1 =cbind(DB1$orderItemID,as.data.frame(lapply(DB1[-1], as.character)))
days = round(mean(DB1$deliveryDate-DB1$orderDate, na.rm=T))
mask = is.na(DB1$deliveryDate)
DB1$deliveryDate[mask] = DB1$orderDate[mask]+days
# DB1$orderItemID orderDate deliveryDate
#1 1 2013-01-21 2013-01-23
#2 2 2013-03-31 2013-04-01
#3 3 2013-04-12 2013-04-14
#4 4 2013-06-01 2013-06-04
#5 5 2014-01-01 2014-01-03
#6 6 2014-02-19 2014-02-21
#7 7 2014-02-27 2014-02-28
#8 8 2014-10-02 2014-10-04
#9 9 2014-10-31 2014-11-01
#10 10 2014-11-21 2014-11-23
我重新整理你的数据,因为它们不干净:
DB1 <- data.frame(orderItemID = 1:10,
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),
deliveryDate = c("2013-01-23", "2013-04-01", NA, "2013-06-04", "2014-01-03", NA, "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))
答案 2 :(得分:1)
假设您已经输入了这样的数据(注意,NA没有用引号括起来,所以它们被读作NA而不是“NA”)...
DB1 <- data.frame(orderItemID = 1:10,
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),
deliveryDate = c("2013-01-23", "2013-03-01", NA, "2013-06-04", "2014-01-03", NA, "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"),
stringsAsFactors = FALSE)
...而且,根据Nicola的回答,这样做是为了使格式正确......
DB1[,2:3]<-lapply(DB1[,2:3],as.Date)
......这也有效:
library(lubridate)
DB1$deliveryDate <- with(DB1, as.Date(ifelse(is.na(deliveryDate), orderDate + days(2), deliveryDate), origin = "1970-01-01"))
或者你可以使用dplyr并管道它:
library(lubridate)
library(dplyr)
DB2 <- DB1 %>%
mutate(deliveryDate = ifelse(is.na(deliveryDate), orderDate + days(2), deliveryDate)) %>%
mutate(deliveryDate = as.Date(.[,"deliveryDate"], origin = "1970-01-01"))