如何从数据集中删除多个年月

时间:2019-03-28 23:57:09

标签: r

我有此数据集,但只想要1、2、3、12个月,并且只想要与这些月份相关的所有年份。日期的格式是年月,我需要保持这种格式,以便最终与另一个数据集合并。谢谢您的帮助

# write the webscraper
library(XML)
library(RCurl)
library(dplyr)
library('zoo')
library('tidyverse')
library('lubridate')
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
  this.url<-paste(avalanche.url, page, sep="")
  this.webpage<-htmlParse(getURL(this.url))
  thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T,stringsAsFactors=F)
  names(thispage.avalanche)<-c('Date','Region','Location','Observer')
  avalanche<-rbind(avalanche,thispage.avalanche)
}

# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)

# convert the dates and total the number of avalanches
avalancheslc <- avalancheslc %>% 
          group_by(Date = format(as.yearmon(Date, "%m/%d/%Y"), "%Y-%m")) %>% 
          summarise(AvalancheTotal = n())
# pipe to only include Dec-Mar of each year
avalancheslc <- avalancheslc %>% filter(as.integer(substr(Date, 6, 7)) %in% c(12, 1:3))




avalancheslc <- avalancheslc %>% mutate(Date = parse_date_time(Date, "%y-%m"))


# A full data frame of months
all_months <- avalancheslc %>% expand(Date = seq(first(Date), last(Date), by = "month"))

# Join to `avalanches` and fill in with 0s
avalancheslc <- avalancheslc %>% right_join(all_months) %>% replace_na(list(AvalancheTotal = 0))

# convert date back to Year-Month format
avalancheslc$Date<-format(avalancheslc$Date, "%Y-%m")


应该看起来像这样

Date    AvalancheTotal
1980-01         1
1980-02         0
1980-03         0
1980-12         0
1981-01         0
1981-02         1
..
.
.
.
2019-03        163

1 个答案:

答案 0 :(得分:1)

您可以使用lubridate包来实现类似的目的。 使用month提取月份的整数值,然后根据您的要求进行过滤,

library(dplyr)
library(lubridate)
df %>%
  filter(month(date) %in% c(1, 2, 3, 12))