问题使用readr而不是read.table导入R中的txt文件

时间:2016-06-14 07:14:48

标签: r readr

我正在尝试导入以下文本文件:

   "year"   "sex"   "name"       "n"    "prop"
"1" 1880    "F"     "Mary"      7065    0.0723835869064085
"2" 1880    "F"     "Anna"      2604    0.0266789611187951
"3" 1880    "F"     "Emma"      2003    0.0205214896777829
"4" 1880    "F"     "Elizabeth" 1939    0.0198657855642641
"5" 1880    "F"     "Minnie"    1746    0.0178884278469341
"6" 1880    "F"     "Margaret"  1578    0.0161672045489473
"7" 1880    "F"     "Ida"       1472    0.0150811946109318
"8" 1880    "F"     "Alice"     1414    0.0144869627580554
"9" 1880    "F"     "Bertha"    1320    0.0135238973413247
"10"1880    "F"     "Sarah"     1288    0.0131960452845653

我使用时没有任何问题:

data <-read.table("~/Documents/baby_names.txt",header=TRUE,se="\t")

但是,我还没有弄清楚如何用readr做到这一点。以下命令失败:

data2 <-read_tsv("~/Documents/baby_names.txt")

我知道问题与第一行包含五个元素(标题)和其余6个这一事实有关但我不知道如何告诉readr忽略“1”,“2”,“3 “ 等等。 有什么建议吗?

2 个答案:

答案 0 :(得分:1)

我们可以分两步阅读(未经测试):

# read the columns, convert to character vector
myNames <- read_tsv(file = "myFile.tsv", n_max = 1)[1, ]

# read the data, skip 1st row, then drop the 1st column
myData <- read_tsv(file = "myFile.tsv", skip = 1, col_names = FALSE)[, -1]

# assign column names
colnames(myData) <- myNames

答案 1 :(得分:0)

您可以分别读取正文和列名,然后将它们组合在一起:

require(readr)

df <- read_tsv("baby_names.txt", col_names = F, skip = 1)

col_names <- read.table("baby_names.txt", header = F, sep = "\t", nrows = 1)

df$X1 <- NULL
names(df) <- col_names

结果:

> head(df)
     1     1         1    1          1
1 1880 FALSE      Mary 7065 0.07238359
2 1880 FALSE      Anna 2604 0.02667896
3 1880 FALSE      Emma 2003 0.02052149
4 1880 FALSE Elizabeth 1939 0.01986579
5 1880 FALSE    Minnie 1746 0.01788843
6 1880 FALSE  Margaret 1578 0.01616720

我认为在read_tsv()中设置row_names并不像read.table()那样简单,但这应该是足够的解决方法。