我有多天的每小时降水数据。 R无论如何都要确定降水大于零的时间,将它们加在一起并除以下雨的时间以获得风暴的强度或平均降雨量?我是R的新手,我知道如何获得每天的平均降雨量,但我宁愿每次降雨都有降雨量。感谢
答案 0 :(得分:1)
rle
(行程长度编码)函数对于此类问题非常有用。使用@ aaryno的可爱数据:
dat <- read.csv(url('http://www.wunderground.com/history/airport/KBTV/2015/6/12/DailyHistory.html?req_city=Burlington&req_state=VT&req_statename=&reqdb.zip=05401&reqdb.magic=1&reqdb.wmo=99999&format=1'),stringsAsFactors=FALSE)
# What do you want to do with NA? Assume no rain for now.
dat$PrecipitationIn = as.numeric(dat$PrecipitationIn)
dat$PrecipitationIn[is.na(dat$Precipitation)] = 0
precip = dat$PrecipitationIn
consec_precip = rle(precip > 0)
# calculates runs of consecutive hours of rain
# create an ID for each run of consecutive hours of rain
storm_id = rep(0, length(precip))
storm_id[precip > 0] = rep(1:sum(consec_precip$values),
times = consec_precip$lengths[consec_precip$values])
# calculate mean precipitation within each consecutive rain period
tapply(precip, storm_id, mean)
# 0 corresponds to all the times with no rain
rle
方法取决于均匀间隔的数据,如果时间不规则,则需要更复杂的方法。
答案 1 :(得分:0)
我从您发布的网址底部的链接下载了CSV,并获得了类似的内容,我将其用于我的示例。请注意,最后一列中的DateUTC字段有一些垃圾我不得不摆脱。
> str(dat)
'data.frame': 45 obs. of 15 variables:
$ TimeEDT : chr "12:54 AM" "1:54 AM" "2:54 AM" "3:54 AM" ...
$ TemperatureF : num 62.1 62.1 60.8 61 62.1 62.1 62.1 64.9 66.9 69.1 ...
$ Dew.PointF : num 55.9 55 55.4 55.9 55.9 55.9 57 55.9 57 57 ...
$ Humidity : int 80 78 82 83 80 80 84 73 70 65 ...
$ Sea.Level.PressureIn: num 29.9 29.9 29.9 29.9 29.9 ...
$ VisibilityMPH : num 10 10 10 10 10 10 10 10 10 10 ...
$ Wind.Direction : chr "Calm" "SE" "Calm" "Calm" ...
$ Wind.SpeedMPH : chr "Calm" "3.5" "Calm" "Calm" ...
$ Gust.SpeedMPH : chr "-" "-" "-" "-" ...
$ PrecipitationIn : num 0 0 0 0 0 0 0 0 0 0 ...
$ Events : chr "" "" "" "" ...
$ Conditions : chr "Clear" "Partly Cloudy" "Clear" "Overcast" ...
$ WindDirDegrees : int 0 140 0 0 0 0 0 180 200 170 ...
$ DateUTC.br... : chr "2015-06-12 04:54:00<br />" "2015-06-12 05:54:00<br />" "2015-06-12 06:54:00<br />" "2015-06-12 07:54:00<br />" ...
从此data.frame获取每个降水事件的强度:
dat <- read.csv(url('http://www.wunderground.com/history/airport/KBTV/2015/6/12/DailyHistory.html?req_city=Burlington&req_state=VT&req_statename=&reqdb.zip=05401&reqdb.magic=1&reqdb.wmo=99999&format=1'),stringsAsFactors=FALSE)
# What do you want to do with NA? Assume no rain for now.
dat$PrecipitationIn <- as.numeric(dat$PrecipitationIn)
dat$PrecipitationIn[is.na(dat$Precipitation)]=0
# Just look for changes in the sequence where precip starts or stops
# and adjust for boundary effects
rainingAtStart<-dat$PrecipitationIn[1]>0
dif<-c(rainingAtStart,diff(dat$PrecipitationIn>0))
startEvent <- which(dif>0)
endEvent <- which(dif<0)
if (dat$PrecipitationIn[length(dat[,1])]>0){
endEvent=c(endEvent,length(dat[,1]))
}
X <- data.frame(cbind(startEvent,endEvent,
dat$DateUTC.br...[startEvent],
dat$DateUTC.br...[endEvent]))
names(X) <- c("indStart","indEnd","eventStart","eventEnd")
# Calculate the sum for each precip event
precipByEvent <- apply(X,1,function(x){ sum(dat$PrecipitationIn[x[1]:x[2]]) })
X$eventTotal <- precipByEvent
str(X)
'data.frame': 3 obs. of 5 variables:
$ indStart : Factor w/ 3 levels "15","19","28": 1 2 3
$ indEnd : Factor w/ 3 levels "15","26","45": 1 2 3
$ eventStart: Factor w/ 3 levels "2015-06-12 18:54:00<br />",..: 1 2 3
$ evendEnd : Factor w/ 3 levels "2015-06-12 18:54:00<br />",..: 1 2 3
$ eventTotal: num 0.01 1.12 4.65
我在eventStart和eventEnd中获取了一些奇怪的HTML代码,直接从您提供的网址中的CSV链接获取数据,加上它是一个因素,所以让我们解决这个问题并将其转换为时间对象。 Base R提供了POSIXct
类的基于时间的功能,因此不需要额外的库。
X$eventStart <- gsub('<br />','',X$eventStart)
X$eventEnd <- gsub('<br />','',X$eventEnd)
理想情况下,它将是一个时间对象(POSIXct
)而不是chr
对象,这将允许您对其进行数学运算:
X$eventStart <- as.POSIXct(X$eventStart,format="%Y-%m-%d %H:%M:%S")
X$eventEnd <- as.POSIXct(X$eventEnd,format="%Y-%m-%d %H:%M:%S")
现在你可以通过将总和除以事件时间得到强度(稍微向上舍入,因为我们假设在开始时开始沉降并在任何监测结束时结束。你如何解释这取决于你)
X$inchesPerHour <- X$eventTotal / (as.double(difftime(X$eventEnd,X$eventStart,units="hours")))
str(X)
'data.frame': 3 obs. of 7 variables:
$ indStart : Factor w/ 3 levels "15","19","28": 1 2 3
$ indEnd : Factor w/ 3 levels "16","27","45": 1 2 3
$ eventStart : POSIXct, format: "2015-06-12 18:54:00" "2015-06-12 22:49:00" "2015-06-13 01:31:00"
$ eventEnd : POSIXct, format: "2015-06-12 20:54:00" "2015-06-13 00:50:00" "2015-06-13 03:54:00"
$ eventTotal : num 0.01 1.12 4.65
$ inchesPerHour: num 0.005 0.555 1.951
现在你的X
data.frame有事件的开始和结束时间,原始数据源中的起始/结束的位置(行),事件的总数(英寸)和强度(英寸每小时)。
关于强度和事件持续时间的注意事项:
事件持续时间有一些高估,因为我们假设降雨开始于报告降水的采样期开始时,并且在下一个不再沉淀的时期开始时结束。因此,在样本(测量或行)之间开始和停止的5分钟事件将被记录为一小时事件持续时间。更有趣的是,与测量重叠的5分钟事件(比如在测量前2分钟和3分钟后下雨)将被视为两小时的事件。