我有一个每10分钟有一次温度测量的数据帧。测量是在不同的位置进行的(名为“LCZ”),每个位置的值都在不同的列中。
这是我的数据框的一部分:(它还包含缺失值NA)
Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-08-26 17:00:00 27.5 27.5 27.5 27.0 27.0
2 2017-08-26 17:10:00 27.5 27.0 27.5 27.0 27.0
3 2017-08-26 17:20:00 27.5 27.0 27.0 27.0 27.0
4 2017-08-26 17:30:00 27.0 26.5 27.0 26.5 26.5
5 2017-08-26 17:40:00 26.5 26.5 26.5 26.5 26.5
6 2017-08-26 17:50:00 26.5 26.0 26.5 26.0 26.5
7 2017-08-26 18:00:00 26.5 26.0 26.5 26.5 26.5
8 2017-08-26 18:10:00 27.0 26.0 26.5 26.5 26.0
9 2017-08-26 18:20:00 26.5 26.5 26.5 26.5 26.0
10 2017-08-26 18:30:00 26.5 26.5 26.5 26.5 26.0
我希望每个位置或列计算每小时最小/最大/中值温度,此外还要计算每小时最小值/最大值以及原始数据中分别出现最小值/最大值的时间戳。
这可能与R?
有关我已尝试过各种功能。
group_by
允许我计算每列的最小值/最大值但没有时间戳。 period.apply
也允许我计算最小值/最大值/中值,但仅限于一列。
aggregate()
也没有取得任何成功。
我在R学习,但没有找到解决这个问题的方法。
这个网站帮助我解决了各种各样的问题,但我真的很困惑。有人可以帮忙吗?提前致谢
答案 0 :(得分:5)
我们可以使用lubridate包中的floor_date
创建新列Time2
来显示每小时信息。如果这不是您想要定义每小时分组的方式,您还可以尝试round_date
或ceiling_date
。之后,我们可以使用tidyr包中的gather
将数据框从宽格式转换为长格式。
library(dplyr)
library(tidyr)
library(lubridate)
dat2 <- dat %>%
mutate(Time = ymd_hms(Time),
Time2 = floor_date(Time, unit = "hour")) %>%
gather(LCZ, Value, starts_with("LCZ")) %>%
group_by(Time2, LCZ)
之后,我们可以按LCZ
和Time2
汇总数据。
dat3 <- dat2 %>%
summarise(Min = min(Value, na.rm = TRUE),
Max = max(Value, na.rm = TRUE),
Median = median(Value, na.rm = TRUE)) %>%
ungroup()
dat3
# # A tibble: 10 x 5
# Time2 LCZ Min Max Median
# <dttm> <chr> <dbl> <dbl> <dbl>
# 1 2017-08-26 17:00:00 LCZ.3.10 26.0 27.5 26.8
# 2 2017-08-26 17:00:00 LCZ.3.2 26.5 27.5 27.2
# 3 2017-08-26 17:00:00 LCZ.6.1 26.5 27.5 27.0
# 4 2017-08-26 17:00:00 LCZ.6.9 26.0 27.0 26.8
# 5 2017-08-26 17:00:00 LCZ.9.4 26.5 27.0 26.8
# 6 2017-08-26 18:00:00 LCZ.3.10 26.0 26.5 26.2
# 7 2017-08-26 18:00:00 LCZ.3.2 26.5 27.0 26.5
# 8 2017-08-26 18:00:00 LCZ.6.1 26.5 26.5 26.5
# 9 2017-08-26 18:00:00 LCZ.6.9 26.5 26.5 26.5
# 10 2017-08-26 18:00:00 LCZ.9.4 26.0 26.5 26.0
如果需要,我们可以创建二进制值以指示值是最小值,最大值还是中值,如下所示。当您进一步想要过滤数据框时,此格式非常有用。
dat4 <- dat2 %>%
mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,
Max = (Value == max(Value, na.rm = TRUE)) + 0L,
Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%
ungroup()
dat4
# # A tibble: 50 x 7
# Time Time2 LCZ Value Min Max Median
# <dttm> <dttm> <chr> <dbl> <int> <int> <int>
# 1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2 27.0 0 0 0
# 5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2 27.0 0 1 0
# 9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# # ... with 40 more rows
数据强>
dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
'2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
'2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
'2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
'2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
'2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
'2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
'2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
'2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
'2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
'2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:3)
以下是使用dplyr
动词执行此操作的方法:
library(lubridate)
df %>%
gather(Location, Temp, -Time) %>%
group_by(Date = date(Time), HoD = hour(Time), Location) %>%
mutate_at(.vars = "Temp", .funs = list(Min = min, Max = max, Median = median)) %>%
filter(Temp == Min | Temp == Max) %>%
arrange(Location, Time) %>%
distinct(Temp, .keep_all = T) %>%
mutate(MinMax = ifelse(Temp == Min, "MinTime", "MaxTime")) %>%
dplyr::select(-Temp) %>%
spread("MinMax", "Time")
<强>输出:强>
请注意NA
,表示当天,该小时和该地点的最低和最高温度相同。
# A tibble: 10 x 8
# Groups: Date, HoD, Location [10]
Location Date HoD Min Max Median MaxTime MinTime
<chr> <date> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 LCZ.3.10 2017-08-26 17 26.0 27.5 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
2 LCZ.3.10 2017-08-26 18 26.0 26.5 26.2 2017-08-26 18:20:00 2017-08-26 18:00:00
3 LCZ.3.2 2017-08-26 17 26.5 27.5 27.2 2017-08-26 17:00:00 2017-08-26 17:40:00
4 LCZ.3.2 2017-08-26 18 26.5 27.0 26.5 2017-08-26 18:10:00 2017-08-26 18:00:00
5 LCZ.6.1 2017-08-26 17 26.5 27.5 27.0 2017-08-26 17:00:00 2017-08-26 17:40:00
6 LCZ.6.1 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
7 LCZ.6.9 2017-08-26 17 26.0 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
8 LCZ.6.9 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
9 LCZ.9.4 2017-08-26 17 26.5 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:30:00
10 LCZ.9.4 2017-08-26 18 26.0 26.5 26.0 2017-08-26 18:00:00 2017-08-26 18:10:00
答案 2 :(得分:2)
这是一个tidyverse
解决方案。
说明:我们创建了一个新的小时 - floor
编辑时间列Time.hour
,我们可以将其分组;然后我们计算必要的汇总统计数据。
res <- df %>%
mutate(Time = as.POSIXct(Time, format = "%Y-%m-%d %H:%M:%S")) %>% # Time as POSIXct
gather(location, value, -Time) %>%
mutate(Time.hour = format(Time, "%y-%m-%d %H")) %>%
group_by(Time.hour, location) %>%
summarise(min = min(value), max = max(value), median = median(value));
res;
## A tibble: 10 x 5
## Groups: Time.hour [?]
# Time.hour location min max median
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 17-08-26 17 LCZ.3.10 26.0 27.5 26.8
# 2 17-08-26 17 LCZ.3.2 26.5 27.5 27.2
# 3 17-08-26 17 LCZ.6.1 26.5 27.5 27.0
# 4 17-08-26 17 LCZ.6.9 26.0 27.0 26.8
# 5 17-08-26 17 LCZ.9.4 26.5 27.0 26.8
# 6 17-08-26 18 LCZ.3.10 26.0 26.5 26.2
# 7 17-08-26 18 LCZ.3.2 26.5 27.0 26.5
# 8 17-08-26 18 LCZ.6.1 26.5 26.5 26.5
# 9 17-08-26 18 LCZ.6.9 26.5 26.5 26.5
#10 17-08-26 18 LCZ.9.4 26.0 26.5 26.0
如果需要,请转换为广角:
res %>%
ungroup() %>%
gather(what, val, min:median) %>%
unite(key, what, location) %>%
spread(key, val)
## A tibble: 2 x 16
# Time.hour max_LCZ.3.10 max_LCZ.3.2 max_LCZ.6.1 max_LCZ.6.9 max_LCZ.9.4
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 17-08-26 17 27.5 27.5 27.5 27.0 27.0
#2 17-08-26 18 26.5 27.0 26.5 26.5 26.5
## ... with 10 more variables: median_LCZ.3.10 <dbl>, median_LCZ.3.2 <dbl>,
## median_LCZ.6.1 <dbl>, median_LCZ.6.9 <dbl>, median_LCZ.9.4 <dbl>,
## min_LCZ.3.10 <dbl>, min_LCZ.3.2 <dbl>, min_LCZ.6.1 <dbl>,
## min_LCZ.6.9 <dbl>, min_LCZ.9.4 <dbl>
df <- read.table(text =
"Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
1 '2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
2 '2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
3 '2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
4 '2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
5 '2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
6 '2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
7 '2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
8 '2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
9 '2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
10 '2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0", header = T, row.names = 1)
答案 3 :(得分:2)
不确定OP
希望以何种格式显示结果。可以使用mutate_at
找到一个解决方案:
library(lubridate)
library(dplyr)
result <- df %>% mutate(Time = ymd_hms(Time)) %>%
group_by(Hourly = format(Time, "%Y%m%d%H")) %>%
mutate_at(vars(starts_with("LCZ")), funs(min = min, max = max, med = median )) %>%
select(Time, Hourly, sort(names(select(.,-Time-Hourly))))
result[,1:9]
# # A tibble: 10 x 9
# # Groups: Hourly [2]
# Time Hourly LCZ3_02 LCZ3_02_max LCZ3_02_med LCZ3_10 LCZ3_10_max LCZ3_10_med LCZ3_10_min
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-08-26 17:00:00 2017082617 27.5 27.5 27.2 27.5 27.5 26.8 26.0
# 2 2017-08-26 17:10:00 2017082617 27.5 27.5 27.2 27.0 27.5 26.8 26.0
# 3 2017-08-26 17:20:00 2017082617 27.5 27.5 27.2 27.0 27.5 26.8 26.0
# 4 2017-08-26 17:30:00 2017082617 27.0 27.5 27.2 26.5 27.5 26.8 26.0
# 5 2017-08-26 17:40:00 2017082617 26.5 27.5 27.2 26.5 27.5 26.8 26.0
# 6 2017-08-26 17:50:00 2017082617 26.5 27.5 27.2 26.0 27.5 26.8 26.0
# 7 2017-08-26 18:00:00 2017082618 26.5 27.0 26.5 26.0 26.5 26.2 26.0
# 8 2017-08-26 18:10:00 2017082618 27.0 27.0 26.5 26.0 26.5 26.2 26.0
# 9 2017-08-26 18:20:00 2017082618 26.5 27.0 26.5 26.5 26.5 26.2 26.0
# 10 2017-08-26 18:30:00 2017082618 26.5 27.0 26.5 26.5 26.5 26.2 26.0
数据强>
df <- read.table(text =
"Time LCZ3_02 LCZ3_10 LCZ6_01 LCZ6_09 LCZ9_04
1 '2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
2 '2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
3 '2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
4 '2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
5 '2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
6 '2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
7 '2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
8 '2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
9 '2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
10 '2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0",
header = TRUE, stringsAsFactors = FALSE)