我想知道如何从他们的FTP站点下载LEHD文件。
https://lehd.ces.census.gov/data/lodes/LODES7/
我需要为工作场所和居住地点下载多年的数据。这些文件是定期命名的,技术文档可以在这里找到:
https://lehd.ces.census.gov/data/lodes/LODES7/LODESTechDoc7.2.pdf S000引用所有劳动力细分市场 JT00引用所有作业类型
因此典型的文件名是:ca_wac_S000_JT00_2008.csv.gz 在https://lehd.ces.census.gov/data/lodes/LODES7/ca/wac/
的'目录&#URL中This bit of git-hub code seems relevant。 Harvard tutorial非常有用,为我提供了一种创建所有文件列表的方法。但是当我遇到SSL问题时,我无法让实际的下载工作 - R.curl hasn't worked for me。
扩展代码似乎无效:
install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
x
#the above code works.
#my implementation...fails
URL <- "https://lehd.ces.census.gov/data/lodes/LODES7/ca/wac/ca_wac_S000_JT00_2002.csv.gz"
x <- getURL(URL)
#results in following error:
#Error in function (type, msg, asError = TRUE) :
# error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
devtools :: session_info()会话信息--------------------------------------- ------------------------------------------------设定值版本R版本3.4.3 (2017-11-30)系统x86_64,mingw32 ui RStudio (1.1.383)语言(EN)整理 English_United States.1252 tz America / Denver
日期2017-12-17包 -------------------------------------------------- ----------------------------------------- package * version date source acs * 2.1。 2
2017-10-10 CRAN(R 3.4.3)断言0.2.0 2017-04-11 CRAN(R 3.4.3)基数* 3.4.3 2017-12-06本地bindr 0.1 2016-11-13 CRAN(R 3.4.3)bindrcpp 0.2 2017-06-17 CRAN(R 3.4.3)类7.3-14 2015-08 -30 CRAN(R 3.4.3) classInt 0.1-24 2017-04-16 CRAN(R 3.4.3)编译器3.4.3
2017-12-06当地卷曲* 3.1 2017-12-12 CRAN(R 3.4.3)数据集* 3.4.3 2017-12-06本地DBI 0.7 2017-06-18 CRAN(R 3.4.3)devtools * 1.13.4 2017-11-09 CRAN(R 3.4.3)摘要0.6.13 2017-12-14 CRAN(R 3.4.3)dplyr * 0.7.4 2017-09-28 CRAN(R 3.4.3)e1071 1.6-8 2017-02-02 CRAN(R 3.4.3)外国0.8-69 2017-06-22 CRAN(R 3.4.3) gdtools * 0.1.6 2017-09-01 CRAN(R 3.4.3)git2r 0.19.0
2017-07-19 CRAN(R 3.4.3)胶水1.2.0 2017-10-29 CRAN(R 3.4.3)图形* 3.4.3 2017-12-06本地grDevices * 3.4.3 2017-12-06本地网格3.4.3 2017-12-06本地hms 0.4.0 2017-11-23 CRAN(R 3.4。 3)httr 1.3.1 2017-08-20 CRAN(R 3.4.3)格子0.20-35 2017-03-25 CRAN(R 3.4.3)lodes * 0.1.0 2017-12-17 git(@ 8cca008) magrittr 1.5 2014-11-22 CRAN(R 3.4.3)maptools 0.9-2
2017-03-25 CRAN(R 3.4.3)备忘录1.1.0 2017-04-21 CRAN(R 3.4.3)方法* 3.4.3 2017-12-06本地pkgconfig 2.0.1 2017-03-21 CRAN(R 3.4.3)plyr 1.8.4 2016-06-08 CRAN(R 3.4.3)purrr 0.2。 4 2017-10-18 CRAN(R 3.4.3)R6
2.2.2 2017-06-17 CRAN(R 3.4.3)rappdirs 0.3.1 2016-03-28 CRAN(R 3.4.3)Rcpp 0.12.14 2017-11-23 CRAN(R 3.4.3)readr 1.1.1 2017-05-16 CRAN(R 3.4.3)rgdal 1.2-16 2017-11-21 CRAN(R 3.4.3)rgeos 0.3-26 2017-10-31 CRAN(R 3.4.3)rlang 0.1.4 2017-11-05 CRAN(R 3.4.3)sf 0.5-5 2017-10-31 CRAN(R 3.4.3)sp * 1.2-5 2017-06-29 CRAN(R 3.4.3)stats * 3.4.3 2017-12-06 local stringi 1.1.6 2017-11-17 CRAN(R 3.4.2)stringr * 1.2.0 2017-02-18 CRAN(R 3.4.3) tibble 1.3.4 2017-08-22 CRAN(R 3.4.3)tigris * 0.5.3
2017-05-26 CRAN(R 3.4.3)工具3.4.3 2017-12-06当地
udunits2 0.13 2016-11-17 CRAN(R 3.4.1)单位0.4-6
2017-08-27 CRAN(R 3.4.3)utils * 3.4.3 2017-12-06 local
uuid 0.1-2 2015-07-28 CRAN(R 3.4.1)with 2.1.0
2017-11-01 CRAN(R 3.4.3)XML * 3.98-1.9 2017-06-19 CRAN(R 3.4.1)
答案 0 :(得分:3)
如果你可以使用GitHub可安装的软件包(在我在CRAN上获得它之前会有一点点),那么你可以给https://github.com/hrbrmstr/lodes一个去:
devtools::install_git("https://github.com/hrbrmstr/lodes.git")
library(lodes)
library(dplyr)
de <- read_lodes("de", "od", "aux", "JT00", "2006", "~/Data/lodes")
glimpse(de)
## Observations: 68,284
## Variables: 13
## $ w_geocode <dbl> 1.000104e+14, 1.000104e+14, 1.000104e+14, 1.000104e+14, 1.000104e+14, 1.000104e+14, 1.000104e+14...
## $ h_geocode <chr> "240119550001006", "240119550001040", "240299501002080", "240299501003088", "240299503002017", "...
## $ S000 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ SA01 <int> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, ...
## $ SA02 <int> 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, ...
## $ SA03 <int> 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...
## $ SE01 <int> 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, ...
## $ SE02 <int> 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, ...
## $ SE03 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, ...
## $ SI01 <int> 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, ...
## $ SI02 <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ SI03 <int> 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ...
## $ createdate <int> 20160228, 20160228, 20160228, 20160228, 20160228, 20160228, 20160228, 20160228, 20160228, 201602...
它具有阅读和阅读功能。缓存crosswalk文件和一个读取和缓存单个数据文件的函数。
如果您仍然遇到SSL失败,请与我们联系。如果是,请将devtools::session_info()
或sessionInfo()
的输出添加到您的问题中。
答案 1 :(得分:1)
我找到了解决方案here。它并不完美,因为它将文件加载到内存中,而不是将它们保存到磁盘上。但它对我有用。
years.to.download <- c(2002,2004,2014)
options(scipen = 999) # Supress scientific notation so we can see census geocodes
library(plyr); library(dplyr)
library(downloader) # downloads and then runs the source() function on scripts from github
library(R.utils) # load the R.utils package (counts the number of lines in a file quickly)
# Program start ----------------------------------------------------------------
tf <- tempfile(); td <- tempdir() # Create a temporary file and a temporary directory
# Load the download.cache and related functions
# to prevent re-downloading of files once they've been downloaded.
source_url(
"https://raw.github.com/ajdamico/asdfree/master/Download%20Cache/download%20cache.R",
prompt = FALSE,
echo = FALSE
)
# Loop through and download each year specified by the user
for(year in years.to.download) {
cat("now loading", year, "...", '\n\r')
#-----------Data import: residence area characteristics---------------------
# Data import: workplace area characteristics (i.e. job location data)
# Download each year of data
# Zipped file to the temporary file on your local disk
# S000 references all workforce segments
# JT00 references all job types
download_cached(
url = paste0("http://lehd.ces.census.gov/data/lodes/LODES7/ca/wac/ca_wac_S000_JT00_", year, ".csv.gz"),
destfile = tf,
mode = 'wb'
)
# Create a variable to store the wac file for each year
assign(paste0("wac.", year), read.table(gzfile(tf), header = TRUE, sep = ",",
colClasses = "numeric", stringsAsFactors = FALSE))
# Remove the temporary file from the local disk
file.remove(tf)
# And free up RAM
gc()
#-----------Data import: residence area characteristics---------------------
download_cached(
url = paste0("http://lehd.ces.census.gov/data/lodes/LODES7/ca/rac/ca_rac_S000_JT00_", year, ".csv.gz"),
destfile = tf,
mode = 'wb'
)
# Create a variable to store the rac file for each year
assign(paste0("rac.", year), read.table(gzfile(tf), header = TRUE, sep = ",",
colClasses = "numeric", stringsAsFactors = FALSE))
# Remove the temporary file from the local disk
file.remove(tf)
# And free up RAM
gc()
}