我真的很新R,我有一个问题需要解决。我有这样的数据框
str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 70128 obs. of 2 variables:
$ date: POSIXct, format: "2009-01-01 00:00:00" "2009-01-01 01:00:00" "2009-01-01 02:00:00" "2009-01-01 03:00:00" ...
$ value: num -0.6 -0.7 -0.6 -0.4 -0.4 -0.3 -0.3 -0.3 -0.1 0 ...
所以我有我的Date列,它是POSIXct格式,步长为1小时。我的值列是数字,表示温度。
现在我想按条件删除整天。条件是,如果一天内只有一个单元格低于3(°C),我想删除那一天。
我搜索了一会儿,但我无法解决它。 希望你能帮助我。
提前谢谢
答案 0 :(得分:2)
紧凑dplyr
语法
library(dplyr)
#Building an example data frame
df <- data.frame(
datetime = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00",
"2009-01-01 02:00:00", "2009-01-01 03:00:00",
"2009-01-02 02:00:00", "2009-01-02 03:00:00",
"2009-01-03 04:00:00", "2009-01-03 02:00:00",
"2009-01-03 03:00:00", "2009-01-03 04:00:00",
"2009-01-04 03:00:00", "2009-01-04 04:00:00")),
temp = c(1, -0.7, -0.6,
-0.4, -0.4, -0.3,
-0.3, 10, 4,
0, 10, 5))
#Query
df %>%
mutate(date = lubridate::as_date(datetime)) %>%
group_by(date) %>%
filter(all(temp > 3))
#Result
datetime temp date
<dttm> <dbl> <date>
1 2009-01-04 03:00:00 10. 2009-01-04
2 2009-01-04 04:00:00 5. 2009-01-04
答案 1 :(得分:1)
在编辑之前使用Pasqui的示例并略微修改它......
我选择围绕我的解释构建逻辑,当且仅当一天中只有一个单元/记录低于3ºC时,才能删除一天。因此,如果一天中有两个,三个或更多的细胞/记录低于3ºC,它将被保留。在这个例子中,2009年1月4日所有日期中只有一个单元/记录低于3ºC,所以它被删除了。
library(dplyr)
#Building an example data frame
df <- data.frame(
date = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00",
"2009-01-01 02:00:00", "2009-01-01 03:00:00",
"2009-01-01 04:00:00", "2009-01-01 05:00:00",
"2009-01-02 02:00:00", "2009-01-02 03:00:00",
"2009-01-03 04:00:00", "2009-01-03 02:00:00",
"2009-01-03 03:00:00", "2009-01-03 04:00:00",
"2009-01-04 00:00:00", "2009-01-04 01:00:00")),
temp = c(1, -0.7, -0.6,
-0.4, 3.5, 2.9, -0.4, -0.3,
-0.3, 10, 4,
0, 3.3, 2.5)
)
require(lubridate)
df2 <- df %>%
mutate(
day = date(date),
counter = 1
) %>%
group_by(day) %>%
filter(
if (sum(counter[temp < 3]) == 1) {
FALSE
} else {
TRUE
}
)
# A tibble: 12 x 4
# Groups: day [3]
date temp day counter
<dttm> <dbl> <date> <dbl>
1 2009-01-01 00:00:00 1.0 2009-01-01 1
2 2009-01-01 01:00:00 -0.7 2009-01-01 1
3 2009-01-01 02:00:00 -0.6 2009-01-01 1
4 2009-01-01 03:00:00 -0.4 2009-01-01 1
5 2009-01-01 04:00:00 3.5 2009-01-01 1
6 2009-01-01 05:00:00 2.9 2009-01-01 1
7 2009-01-02 02:00:00 -0.4 2009-01-02 1
8 2009-01-02 03:00:00 -0.3 2009-01-02 1
9 2009-01-03 04:00:00 -0.3 2009-01-03 1
10 2009-01-03 02:00:00 10.0 2009-01-03 1
11 2009-01-03 03:00:00 4.0 2009-01-03 1
12 2009-01-03 04:00:00 0.0 2009-01-03 1
答案 2 :(得分:0)
尝试调整此代码:
玩具数据框(2009-01-01只有1小时,值<3,而2009-01-02无):
df<-data.frame(date=c("2009-01-01 00:00:00", "2009-01-01 01:00:00", "2009-01-01 02:00:00", "2009-01-02 03:00:00"),
+ value=c(-0.6, 8, 4, 7))
df
date value
1 2009-01-01 00:00:00 -0.6
2 2009-01-01 01:00:00 8.0
3 2009-01-01 02:00:00 4.0
4 2009-01-02 03:00:00 7.0
确定要删除的日期
date_to_delete<-unique(as.Date(df[df[,"value"]<3,"date"], format="%Y-%m-%d"))
您的数据框已清除
df[!(as.Date(df$date,format="%Y-%m-%d") %in% date_to_delete),]
date value
4 2009-01-02 03:00:00 7