我有一个包含变量和已售商品数量的数据集:但是,有几天没有值。
我创建了一个数据集,其中所有销售额均为0,其余均为NA。如何将这些行添加到初始数据集中?
此刻,我有这个:
sales
day month year employees holiday sales
1 1 2018 14 0 1058
2 1 2018 25 1 2174
4 1 2018 11 0 987
sales.NA
day month year employees holiday sales
1 1 2018 NA NA 0
2 1 2018 NA NA 0
3 1 2018 NA NA 0
4 1 2018 NA NA 0
我想创建一个新的数据集,插入我没有观测值的天,销售额为0以及所有其他变量的NA。像这样
new.data
day month year employees holiday sales
1 1 2018 14 0 1058
2 1 2018 25 1 2174
3 1 2018 NA NA 0
4 1 2018 11 0 987
我尝试用过这样的东西
merge(sales.NA,sales, all.y=T, by = c("day","month","year"))
但这不起作用
答案 0 :(得分:1)
使用dplyr,您可以使用“ right_join”。例如:
sales <- data.frame(day = c(1,2,4),
month = c(1,1,1),
year = c(2018, 2018, 2018),
employees = c(14, 25, 11),
holiday = c(0,1,0),
sales = c(1058, 2174, 987)
)
sales.NA <- data.frame(day = c(1,2,3,4),
month = c(1,1,1,1),
year = c(2018,2018,2018, 2018)
)
right_join(sales, sales.NA)
这使您拥有
day month year employees holiday sales
1 1 1 2018 14 0 1058
2 2 1 2018 25 1 2174
3 3 1 2018 NA NA NA
4 4 1 2018 11 0 987
这将NA保留在您希望为0的销售中,但是可以通过在sales.NA
中包含销售数据来解决,也可以使用“ tidyr”
right_join(sales, sales.NA) %>% mutate(sales = replace_na(sales, 0))
答案 1 :(得分:0)
这是使用data.table
包的答案,因为我对语法更加熟悉,但是常规data.frames的工作原理几乎相同。我还将切换到适当的日期格式,这将使您的生活更加轻松。
实际上,以这种方式,您将不需要Sales.NA表,因为它将在第一次加入后具有NA的所有天自动解决。
library(data.table)
dt.dates <- data.table(Date = seq.Date(from = as.Date("2018-01-01"), to = as.Date("2018-12-31"),by = "day" ))
dt.sales <- data.table(day = c(1,2,4)
, month = c(1,1,1)
, year = c(2018,2018,2018)
, employees = c(14, 25, 11)
, holiday = c(0,1,0)
, sales = c(1058, 2174, 987)
)
dt.sales[, Date := as.Date(paste(year,month,day, sep = "-")) ]
merge( x = dt.dates
, y = dt.sales
, by.x = "Date"
, by.y = "Date"
, all.x = TRUE
)
> Date day month year employees holiday sales
1: 2018-01-01 1 1 2018 14 0 1058
2: 2018-01-02 2 1 2018 25 1 2174
3: 2018-01-03 NA NA NA NA NA NA
4: 2018-01-04 4 1 2018 11 0 987
...
答案 2 :(得分:0)
这是另一个data.table
解决方案:
jvars = c("day","month","year")
merge(sales.NA[, ..jvars], sales, by = jvars, all.x = TRUE)[is.na(sales), sales := 0L][]
day month year employees holiday sales
1: 1 1 2018 14 0 1058
2: 2 1 2018 25 1 2174
3: 3 1 2018 NA NA 0
4: 4 1 2018 11 0 987
或使用一些更简洁的语法:
sales[sales.NA[, ..jvars], on = jvars][is.na(sales), sales := 0][]
可重现数据:
sales <- structure(list(day = c(1L, 2L, 4L), month = c(1L, 1L, 1L), year = c(2018L,
2018L, 2018L), employees = c(14L, 25L, 11L), holiday = c(0L,
1L, 0L), sales = c(1058L, 2174L, 987L)), row.names = c(NA, -3L
), class = c("data.table", "data.frame"))
sales.NA <- structure(list(day = 1:4, month = c(1L, 1L, 1L, 1L), year = c(2018L,
2018L, 2018L, 2018L), employees = c(NA, NA, NA, NA), holiday = c(NA,
NA, NA, NA), sales = c(0L, 0L, 0L, 0L)), row.names = c(NA, -4L
), class = c("data.table", "data.frame"))