从m / d / y到y / m求和的总数

时间:2019-03-28 20:29:18

标签: r

我有发生的每个雪崩的数据。我需要计算每年和每月发生的雪崩数量,但数据仅给出发生雪崩的确切天数。如何汇总每年一年中发生的次数?我还只需要与冬季相关的年月(12月(12)-3月(3))。请帮忙!

library(XML)
library(RCurl)
library(dplyr)
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
  this.url<-paste(avalanche.url, page, sep="")
  this.webpage<-htmlParse(getURL(this.url))
  thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T,stringsAsFactors=F)
  names(thispage.avalanche)<-c('Date','Region','Location','Observer')
  avalanche<-rbind(avalanche,thispage.avalanche)
}

# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)

输出应类似于:

Date       AvalancheTotal
2000-01           1
2000-02           2
2000-03           8
2000-12           23
2001-01           16
.
.
.
.
.
2019-03            45

2 个答案:

答案 0 :(得分:0)

我们可以从yearmon转换为zoo,并在group_by中使用它来获取行数

library(dplyr)
library(zoo)

dim(avalancheslc)
#[1] 5494    4
out <- avalancheslc %>% 
          group_by(Date = format(as.yearmon(Date, "%m/%d/%Y"), "%Y-%m")) %>% 
          summarise(AvalancheTotal = n())

如果我们仅需要从DecemberMarch的输出,则filter数据

subOut <- out %>%
            filter(as.integer(substr(Date, 6, 7)) %in% c(12, 1:3))

或者可以在链中更早filter

library(lubridate)
out <- avalancheslc %>%
         mutate(Date = as.yearmon(Date, "%m/%d/%Y")) %>%
         filter(month(Date) %in% c(12, 1:3))  %>% 
         count(Date)
dim(out)
#[1] 67  2

现在,用0填充

mths <- month.abb[c(12, 1:3)]
out1 <- crossing(Months = mths, 
            Year = year(min(out$Date)):year(max(out$Date))) %>%
       unite(Date, Months, Year, sep= " ") %>% 
       mutate(Date = as.yearmon(Date)) %>% 
       left_join(out) %>% 
       mutate(n = replace_na(n, 0)) 

tail(out1)
# A tibble: 6 x 2
#  Date              n
#  <S3: yearmon> <dbl>
#1 Mar 2014        100
#2 Mar 2015         94
#3 Mar 2016         96
#4 Mar 2017         93
#5 Mar 2018        126
#6 Mar 2019        163

答案 1 :(得分:0)

使用dplyr,您可以从“日期”列中获取感兴趣的变量(“年-月”),并按此变量分组,然后计算每个组中的行数。 以类似的方式,您可以过滤以仅获取所需的月份:

library(dplyr)
winter_months <- c(1:3, 12)

avalancheslc %>% 
    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>% 
    mutate(YearMonth = format(Date,"%Y-%m"), 
           Month = as.numeric(format(Date,"%m"))) %>%
    filter(Month %in% winter_months) %>%
    group_by(YearMonth) %>%
    summarise(AvalancheTotal = n())