我试图了解如何将BLS数据库中的一些文本文件读入R中。
url <- "http://download.bls.gov/pub/time.series/oe/oe.datatype"
datatype <- read.table(url)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :line 1
did not have 6 elements
我也尝试过:
datatype <- read.table(url, header = FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:line 1 did not have 6 elements
和
datatype <- read.table(url, sep="\t")
这最后一种方法几乎可行,但是当我检查数据框时,看起来第一列已经转换为行名,最后一列填充了NA&#39>。
datatype
datatype_code datatype_name
01 Employment NA
02 Employment percent relative standard error NA
03 Hourly mean wage NA
04 Annual mean wage NA
我也尝试下载并检查文件,但我不确定我在Notepad ++中看到了什么。
download.file(url, "datatype.txt")
datatype <- read.table("datatype.txt", sep='\t')
datatype
datatype_code datatype_name
01 Employment NA
02 Employment percent relative standard error NA
03 Hourly mean wage NA
04 Annual mean wage NA
感谢您的任何提示。只是想学习。
答案 0 :(得分:2)
正如@ zx8754所指出的,这个特殊文件有一个额外的制表符&#34; \ t&#34;在每一行中,但标题行除外。
您可以在没有标题的情况下阅读该文件:
url <- "http://download.bls.gov/pub/time.series/oe/oe.datatype"
df <- read.delim(url, skip = 1, header = FALSE)
head(df)
# V1 V2 V3
# 1 1 Employment NA
# 2 2 Employment percent relative standard error NA
# 3 3 Hourly mean wage NA
# 4 4 Annual mean wage NA
# 5 5 Wage percent relative standard error NA
# 6 6 Hourly 10th percentile wage NA
您还可以分别在第一行中读取标题:
header <- read.delim(url, nrows = 1, header = FALSE, stringsAsFactors = FALSE)
names(df) <- header
head(df)
# datatype_code datatype_name NA
# 1 1 Employment NA
# 2 2 Employment percent relative standard error NA
# 3 3 Hourly mean wage NA
# 4 4 Annual mean wage NA
# 5 5 Wage percent relative standard error NA
# 6 6 Hourly 10th percentile wage NA
此时您可能想要删除第三列:
df <- df[-3]
答案 1 :(得分:0)
这是一个很好用的tidyverse选项。事实证明,readr :: read_tsv可以有效地处理这个问题。
library(tidyverse)
df <- read_tsv(url)
head(df)
# A tibble: 6 x 2
datatype_code datatype_name
<chr> <chr>
1 01 Employment
2 02 Employment percent relative standard error
3 03 Hourly mean wage
4 04 Annual mean wage
5 05 Wage percent relative standard error
6 06 Hourly 10th percentile wage