我使用以下代码准备了一些数据:
# # Data Preparation ----------------------
library(lubridate)
start_date <- "2018-10-30 00:00:00"
start_date <- as.POSIXct(start_date, origin="1970-01-01")
dates <- c(start_date)
for(i in 1:287) {
dates <- c(dates, start_date + minutes(i * 10))
}
dates <- as.POSIXct(dates, origin="1970-01-01")
date_val <- format(dates, '%d-%m-%Y')
weather.forecast.data <- data.frame(dateTime = dates, date = date_val, id = 'GH1', radiation = runif(288))
weather.forecast.data$radiation[(weather.forecast.data$id == 'GH1') & (weather.forecast.data$date == '30-10-2018')] = NA
我的任务是从weather.forecast.data
过滤行,其中每个id和date的唯一实例都缺少所有辐射值。
我有使用data.table
编写的代码:
library(data.table)
setDT(weather.forecast.data)
weather.forecast.data[, dateid := paste(date, id, sep = "__")]
weather.forecast.data[, is_all_na := all(is.na(radiation)), dateid]
weather.forecast.data = weather.forecast.data[!(is_all_na), !c('dateid', 'is_all_na'), with = FALSE]
我正在尝试使用dplyr
函数和管道操作以使其更具可读性:
library(dplyr)
weather.forecast.data %>%
mutate(dateid = paste(date, id, sep = "__")) %>%
group_by(dateid) %>%
summarise(is_all_na = all(is.na(radiation))) %>%
filter(is_all_na) %>%
select(dateid)
我能够找回所有丢失的id
。但是,我无法从原始数据中删除id
。
答案 0 :(得分:4)
无需在一起paste
列,您可以group_by
多列
library(dplyr)
weather.forecast.data %>%
group_by(date, id) %>%
filter(!all(is.na(radiation)))
这将删除其中all
和radiation
的{{1}} NA
是date
的行。
答案 1 :(得分:3)
以下是使用data.table
的一些选项:
1)使用.I
子集原始数据集
setDT(weather.forecast.data)
weather.forecast.data[
weather.forecast.data[, .I[sum(is.na(radiation))!=.N], by=.(date, id)]$V1
]
2)使用反联接
setDT(weather.forecast.data)[
!weather.forecast.data[, all(is.na(radiation)), by=.(date, id)][(V1)],
on=.(date, id)]
输出(希望这是OP所要查找的,因为未发布任何示例输出):
dateTime date id radiation
1: 2018-10-31 00:00:00 31-10-2018 GH1 0.01794694
2: 2018-10-31 00:10:00 31-10-2018 GH1 0.55482429
3: 2018-10-31 00:20:00 31-10-2018 GH1 0.31422673
4: 2018-10-31 00:30:00 31-10-2018 GH1 0.43734765
5: 2018-10-31 00:40:00 31-10-2018 GH1 0.29053698
---
140: 2018-10-31 23:10:00 31-10-2018 GH1 0.56968294
141: 2018-10-31 23:20:00 31-10-2018 GH1 0.26055891
142: 2018-10-31 23:30:00 31-10-2018 GH1 0.15140244
143: 2018-10-31 23:40:00 31-10-2018 GH1 0.59824054
144: 2018-10-31 23:50:00 31-10-2018 GH1 0.55101842