需要帮助将在线txt数据读入r

时间:2016-07-07 18:44:49

标签: r

难以将http://kenpom.com/cbbga16.txt作为数据框读入R,尝试了

 read.table("http://kenpom.com/cbbga16.txt", header=FALSE, sep="\t" , fill=T)

但无法正确分隔列。请帮忙!

2 个答案:

答案 0 :(得分:1)

如上所述alistaire,您可以使用具有以下宽度的read.fwf

data <- read.fwf('http://kenpom.com/cbbga16.txt', widths=c(11,24,3,23,4,4,21))

然而,这些宽度仅对该数据是主观的。如果有某种分隔符,这将更容易处理。我猜测它在翻译成文本文件时丢失了。

答案 1 :(得分:1)

readr::read_table功能几乎适用于此开箱即用。它解析每个列并根据完全空列分隔变量。不幸的是,它被不等的行长度抛弃了。

> fileURL <- "http://kenpom.com/cbbga16.txt"
> 
> library(readr)
> library(stringr)
> library(tibble)
> 
> glimpse(read_table(fileURL, col_names = FALSE))
Observations: 3,244
Variables: 7
$ X1 <chr> "11/13/2015", "11/13/2015", "11/13/2015", "11/13/2015", "...
$ X2 <chr> "Washington", "Johnson FL", "Montana St.", "Monmouth", "K...
$ X3 <chr> "77", "71", "76", "84", "62", "58", "73", "60", "52", "72...
$ X4 <chr> "Texas", "Florida A&M", "Hawaii", "UCLA", "Columbia", "Se...
$ X5 <int> 71, 103, 87, 81, 107, 56, 89, 71, 80, 78, 82, 90, 41, 86,...
$ X6 <chr> "N", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1...
$ X7 <chr> "Shanghai, China", "3/2015 Pittsburgh Bradford     49 Buf...
>

幸运的是,将行填充到相同的长度并不太难:

> tmp <- read_lines(fileURL)
> tmp <- str_pad(tmp, width = max(str_length(tmp)), side = "right")

之后read_table能够正确确定列边界:

> glimpse(read_table(str_c(tmp, collapse = "\n"), col_names = FALSE))
Observations: 5,952
Variables: 7
$ X1 <chr> "11/13/2015", "11/13/2015", "11/13/2015", "11/13/2015", "...
$ X2 <chr> "Washington", "Johnson FL", "Pittsburgh Bradford", "Monta...
$ X3 <chr> "77", "71", "49", "76", "65", "84", "56", "62", "50", "58...
$ X4 <chr> "Texas", "Florida A&M", "Buffalo", "Hawaii", "California"...
$ X5 <int> 71, 103, 109, 87, 97, 81, 80, 107, 70, 56, 74, 89, 63, 71...
$ X6 <chr> "N", "", "", "", "", "1", "", "", "", "", "", "", "2", ""...
$ X7 <chr> "Shanghai, China", "", "", "", "", "", "", "", "", "", ""...
>