Web Scrape非农就业人数日期为R

时间:2016-04-23 18:26:36

标签: html r web-scraping

我想从此处http://www.bls.gov/bls/archived_sched.htm(存档)和此处http://www.bls.gov/schedule/news_release/empsit.htm(当年)网上查看非农场就业人数的过去日期。

Peter Chan对FOMC的约会类似于此:https://github.com/returnandrisk/r-code/blob/master/FOMC%20Dates%20-%20Scraping%20Data%20From%20Web%20Pages.R。这是他的代码:

install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org")

library(httr)
library(XML)

# get and parse web page content                                            
webpage <- content(GET("http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"), as="text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr, "href")
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format="%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

我想为NFP复制一下。正如fomcdates包含所有FOMC日期一样,我想创建包含所有NFP日期的NFP日期。

有人知道今年如何才能这样做吗? (问当前年份似乎是最简单的)。谢谢。

1 个答案:

答案 0 :(得分:1)

这适用于今年。

library(rvest)

url <- 'http://www.bls.gov/schedule/news_release/empsit.htm'
ses <- html_session(url)
tbl <- html_table(ses, fill = T) 
nfpdates <- tbl[[2]]$`Release Date`
nfpdates <- gsub('\\.', '', nfpdates)
nfpdates <- as.Date(nfpdates, '%b %d, %Y')