我正在使用大约800个气象站的数据集,从1986年到2014年,每个气象站的月气温值。数据分为三列:(1)站名,(2)日期(年和月) ),和(3)温度。通常,数据看起来像这样:
STATION DATE TEMP
Station 1 198601 -15
Station 1 198602 -16
Station 1 201401 -10
Station 1 201402 -14
Station 2 198601 -11
Station 2 198602 -9
Station 2 201401 -5
Station 2 201402 -4
我需要提取每个气象站在不同年份范围内给定月份的平均温度。例如,如果我需要知道1986 - 1990年每个气象站的7月平均温度。我的理想输出将是一个新的列表或数据框,根据我指定的日期范围给出每个电台的平均温度。
我确信这可以使用for循环完成,但我不是很精通创建这样的代码。任何建议都将不胜感激。
答案 0 :(得分:2)
使用dplyr代替数据表
weather <- data.frame(station = c("Station 1", "Station 1", "Station 1", "Station 1",
"Station 2", "Station 2", "Station 2", "Station 2"),
date = c(198601, 198602, 201401, 201402, 198601, 198602, 201401, 201402),
temp = c(-15, -16, -10, -14, -11, -9, -5, -4))
library(dplyr)
library(stringr)
# get month and year columns in data
weather <- mutate(weather,
year = str_extract(date, "\\d{4}"),
month = str_extract(date, "\\d{2}$"))
# get the mean for each station for each month
mean_station <- group_by(weather, station, month) %>%
summarise(mean_temp = mean(temp, na.rm = T))
如果您只需要在特定日期范围内执行此操作,则可以在年份上添加过滤器
mean_station <- group_by(weather, station, month) %>%
filter(year >= 1986, year <= 2015) %>%
summarise(mean_temp = mean(temp, na.rm = T))
答案 1 :(得分:1)
这样的东西......?
> df$month <- substr(df$DATE, 5, 6)
> result <- aggregate(TEMP~STATION+month, mean, data=df)
> data.frame(Year=unique(substr(df$DATE, 1, 4)), result)
Year STATION month TEMP
1 1986 Station1 01 -12.5
2 2014 Station2 01 -8.0
3 1986 Station1 02 -15.0
4 2014 Station2 02 -6.5
答案 2 :(得分:1)
或者
library(data.table)
setDT(df)[, list(MeanTemp = mean(TEMP)),
by = list(STATION, Mon = substr(DATE, 5, 6))]
# STATION Mon MeanTemp
# 1: Station 1 01 -12.5
# 2: Station 1 02 -15.0
# 3: Station 2 01 -8.0
# 4: Station 2 02 -6.5
答案 3 :(得分:1)
我也在学习R并且可能不会直接回答您的问题,但我想提一下海洋包有助于分析此类数据
例如
require(seas)
pdf( paste("test",".pdf", sep="") )
for (i in 1: length(STATION)){
d1 <-mksub(mdata,id=STATION[i]) # making a subset for each station based on name/unique id
dat.ss <- seas.sum(d1)
plot(dat.ss)
}
graphics off ()
您必须确保数据集的str()是海洋所需的格式。 有了这么大的数据集,我建议循环和函数有助于快速进行数据分析。如果你有另一种循环方式,如果你可以分享
那么感激不尽