我有发生的每个雪崩的数据。我需要计算每年和每月发生的雪崩数量,但数据仅给出发生雪崩的确切天数。如何汇总每年一年中发生的次数?我还只需要与冬季相关的年月(12月(12)-3月(3))。请帮忙!
library(XML)
library(RCurl)
library(dplyr)
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
this.url<-paste(avalanche.url, page, sep="")
this.webpage<-htmlParse(getURL(this.url))
thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T,stringsAsFactors=F)
names(thispage.avalanche)<-c('Date','Region','Location','Observer')
avalanche<-rbind(avalanche,thispage.avalanche)
}
# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)
输出应类似于:
Date AvalancheTotal
2000-01 1
2000-02 2
2000-03 8
2000-12 23
2001-01 16
.
.
.
.
.
2019-03 45
答案 0 :(得分:0)
我们可以从yearmon
转换为zoo
,并在group_by中使用它来获取行数
library(dplyr)
library(zoo)
dim(avalancheslc)
#[1] 5494 4
out <- avalancheslc %>%
group_by(Date = format(as.yearmon(Date, "%m/%d/%Y"), "%Y-%m")) %>%
summarise(AvalancheTotal = n())
如果我们仅需要从December
到March
的输出,则filter
数据
subOut <- out %>%
filter(as.integer(substr(Date, 6, 7)) %in% c(12, 1:3))
或者可以在链中更早filter
版
library(lubridate)
out <- avalancheslc %>%
mutate(Date = as.yearmon(Date, "%m/%d/%Y")) %>%
filter(month(Date) %in% c(12, 1:3)) %>%
count(Date)
dim(out)
#[1] 67 2
现在,用0填充
mths <- month.abb[c(12, 1:3)]
out1 <- crossing(Months = mths,
Year = year(min(out$Date)):year(max(out$Date))) %>%
unite(Date, Months, Year, sep= " ") %>%
mutate(Date = as.yearmon(Date)) %>%
left_join(out) %>%
mutate(n = replace_na(n, 0))
tail(out1)
# A tibble: 6 x 2
# Date n
# <S3: yearmon> <dbl>
#1 Mar 2014 100
#2 Mar 2015 94
#3 Mar 2016 96
#4 Mar 2017 93
#5 Mar 2018 126
#6 Mar 2019 163
答案 1 :(得分:0)
使用dplyr,您可以从“日期”列中获取感兴趣的变量(“年-月”),并按此变量分组,然后计算每个组中的行数。 以类似的方式,您可以过滤以仅获取所需的月份:
library(dplyr)
winter_months <- c(1:3, 12)
avalancheslc %>%
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
mutate(YearMonth = format(Date,"%Y-%m"),
Month = as.numeric(format(Date,"%m"))) %>%
filter(Month %in% winter_months) %>%
group_by(YearMonth) %>%
summarise(AvalancheTotal = n())