组中所有不适用的投递ID

时间:2018-10-30 07:10:37

标签: r dplyr data.table

我使用以下代码准备了一些数据:

# # Data Preparation ----------------------
library(lubridate)
start_date <- "2018-10-30 00:00:00"
start_date <- as.POSIXct(start_date, origin="1970-01-01")
dates <- c(start_date)
for(i in 1:287) {
    dates <- c(dates, start_date + minutes(i * 10))
}
dates <- as.POSIXct(dates, origin="1970-01-01")
date_val <- format(dates, '%d-%m-%Y')

weather.forecast.data <- data.frame(dateTime = dates, date = date_val, id = 'GH1', radiation = runif(288))
weather.forecast.data$radiation[(weather.forecast.data$id == 'GH1') & (weather.forecast.data$date == '30-10-2018')] = NA

我的任务是从weather.forecast.data过滤行,其中每个id和date的唯一实例都缺少所有辐射值。

我有使用data.table编写的代码:

library(data.table)
setDT(weather.forecast.data)
weather.forecast.data[, dateid := paste(date, id, sep = "__")]
weather.forecast.data[, is_all_na := all(is.na(radiation)), dateid]
weather.forecast.data = weather.forecast.data[!(is_all_na), !c('dateid', 'is_all_na'), with = FALSE]

我正在尝试使用dplyr函数和管道操作以使其更具可读性:

library(dplyr)
weather.forecast.data %>%
  mutate(dateid = paste(date, id, sep = "__")) %>%
  group_by(dateid) %>%
  summarise(is_all_na = all(is.na(radiation))) %>%
  filter(is_all_na) %>%
  select(dateid)

我能够找回所有丢失的id。但是,我无法从原始数据中删除id

2 个答案:

答案 0 :(得分:4)

无需在一起paste列,您可以group_by多列

library(dplyr)

weather.forecast.data %>%
   group_by(date, id) %>%
   filter(!all(is.na(radiation))) 

这将删除其中allradiation的{​​{1}} NAdate的行。

答案 1 :(得分:3)

以下是使用data.table的一些选项:

1)使用.I子集原始数据集

setDT(weather.forecast.data)
weather.forecast.data[
    weather.forecast.data[, .I[sum(is.na(radiation))!=.N], by=.(date, id)]$V1
]

2)使用反联接

setDT(weather.forecast.data)[
    !weather.forecast.data[, all(is.na(radiation)), by=.(date, id)][(V1)],
    on=.(date, id)]

输出(希望这是OP所要查找的,因为未发布任何示例输出):

                dateTime       date  id  radiation
  1: 2018-10-31 00:00:00 31-10-2018 GH1 0.01794694
  2: 2018-10-31 00:10:00 31-10-2018 GH1 0.55482429
  3: 2018-10-31 00:20:00 31-10-2018 GH1 0.31422673
  4: 2018-10-31 00:30:00 31-10-2018 GH1 0.43734765
  5: 2018-10-31 00:40:00 31-10-2018 GH1 0.29053698
 ---                                              
140: 2018-10-31 23:10:00 31-10-2018 GH1 0.56968294
141: 2018-10-31 23:20:00 31-10-2018 GH1 0.26055891
142: 2018-10-31 23:30:00 31-10-2018 GH1 0.15140244
143: 2018-10-31 23:40:00 31-10-2018 GH1 0.59824054
144: 2018-10-31 23:50:00 31-10-2018 GH1 0.55101842