以下代码示例是从census.gov网站下载各州的人口普查数据。我遇到的问题是当我下载为csv文件时,前导零被删除。下载时如何保持文件的原始结构?
stateFIPScodes<-seq(10,13,1)
for(i in 1:length(stateFIPScodes)){
URL<-paste("https://www.census.gov/popest/data/intercensal/county/files/CO-EST00INT- ALLDATA-",stateFIPScodes[i],".csv",sep="" )
destfile<-paste("state2000_2010_",stateFIPScodes[i],".csv" ,sep="") # CSV files drop leading zero!!
download.file(URL, destfile)
}
谢谢!
答案 0 :(得分:1)
好像文件已正确下载,但你是对的:如果你使用read.csv
将数据加载到R中,那么有些列会被解释为{{ 1}},因此他们失去了前导零。
获取下载文件的代码 -
numeric
如果我们使用base stateFIPScodes<-seq(10,13,1)
for(i in seq_along(stateFIPScodes)){
code <- stateFIPScodes[[i]]
URL <- paste0("https://www.census.gov/popest/data/intercensal/county/files/CO-EST00INT-ALLDATA-", code, ".csv")
destfile <- paste0("state2000_2010_",code,".csv")
download.file(URL, destfile)
}
,则不会得到尾随零:
read.csv
那是因为前几列是以数字形式读入的。
library(dplyr)
read.csv("state2000_2010_10.csv") %>%
select(1:5) %>%
head
#> SUMLEV STATE COUNTY STNAME CTYNAME
#> 1 50 10 1 Delaware Kent County
#> 2 50 10 1 Delaware Kent County
#> 3 50 10 1 Delaware Kent County
#> 4 50 10 1 Delaware Kent County
#> 5 50 10 1 Delaware Kent County
#> 6 50 10 1 Delaware Kent County
有两种方法可以解决这个问题:
read.csv("state2000_2010_10.csv") %>% str()
#> 'data.frame': 780 obs. of 50 variables:
#> $ SUMLEV : int 50 50 50 50 50 50 50 50 50 50 ...
#> $ STATE : int 10 10 10 10 10 10 10 10 10 10 ...
#> $ COUNTY : int 1 1 1 1 1 1 1 1 1 1 ...
,或者只是通过添加read.csv
来阻止所有转化。colClasses = "character"
。我们可以阻止自动强制:
readr::read_csv
您需要选择要投射到read.csv("state2000_2010_10.csv", colClasses = "character") %>% str()
#> 'data.frame': 780 obs. of 50 variables:
#> $ SUMLEV : chr "050" "050" "050" "050" ...
#> $ STATE : chr "10" "10" "10" "10" ...
#> $ COUNTY : chr "001" "001" "001" "001" ...
#> $ STNAME : chr "Delaware" "Delaware" "Delaware" "Delaware" ...
的列。
或者您可以选择列,例如
as.numeric
其次,您可以使用read.csv("state2000_2010_10.csv",
colClasses = c(
SUMLEV = "character",
STATE = "numeric",
COUNTY = "character"
)) %>%
select(1:5) %>%
head
#> SUMLEV STATE COUNTY STNAME CTYNAME
#> 1 050 10 001 Delaware Kent County
#> 2 050 10 001 Delaware Kent County
#> 3 050 10 001 Delaware Kent County
#> 4 050 10 001 Delaware Kent County
#> 5 050 10 001 Delaware Kent County
#> 6 050 10 001 Delaware Kent County
,它具有更智能的列类型推断:
readr