难以将http://kenpom.com/cbbga16.txt作为数据框读入R,尝试了
read.table("http://kenpom.com/cbbga16.txt", header=FALSE, sep="\t" , fill=T)
但无法正确分隔列。请帮忙!
答案 0 :(得分:1)
如上所述alistaire,您可以使用具有以下宽度的read.fwf
:
data <- read.fwf('http://kenpom.com/cbbga16.txt', widths=c(11,24,3,23,4,4,21))
然而,这些宽度仅对该数据是主观的。如果有某种分隔符,这将更容易处理。我猜测它在翻译成文本文件时丢失了。
答案 1 :(得分:1)
readr::read_table
功能几乎适用于此开箱即用。它解析每个列并根据完全空列分隔变量。不幸的是,它被不等的行长度抛弃了。
> fileURL <- "http://kenpom.com/cbbga16.txt"
>
> library(readr)
> library(stringr)
> library(tibble)
>
> glimpse(read_table(fileURL, col_names = FALSE))
Observations: 3,244
Variables: 7
$ X1 <chr> "11/13/2015", "11/13/2015", "11/13/2015", "11/13/2015", "...
$ X2 <chr> "Washington", "Johnson FL", "Montana St.", "Monmouth", "K...
$ X3 <chr> "77", "71", "76", "84", "62", "58", "73", "60", "52", "72...
$ X4 <chr> "Texas", "Florida A&M", "Hawaii", "UCLA", "Columbia", "Se...
$ X5 <int> 71, 103, 87, 81, 107, 56, 89, 71, 80, 78, 82, 90, 41, 86,...
$ X6 <chr> "N", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1...
$ X7 <chr> "Shanghai, China", "3/2015 Pittsburgh Bradford 49 Buf...
>
幸运的是,将行填充到相同的长度并不太难:
> tmp <- read_lines(fileURL)
> tmp <- str_pad(tmp, width = max(str_length(tmp)), side = "right")
之后read_table
能够正确确定列边界:
> glimpse(read_table(str_c(tmp, collapse = "\n"), col_names = FALSE))
Observations: 5,952
Variables: 7
$ X1 <chr> "11/13/2015", "11/13/2015", "11/13/2015", "11/13/2015", "...
$ X2 <chr> "Washington", "Johnson FL", "Pittsburgh Bradford", "Monta...
$ X3 <chr> "77", "71", "49", "76", "65", "84", "56", "62", "50", "58...
$ X4 <chr> "Texas", "Florida A&M", "Buffalo", "Hawaii", "California"...
$ X5 <int> 71, 103, 109, 87, 97, 81, 80, 107, 70, 56, 74, 89, 63, 71...
$ X6 <chr> "N", "", "", "", "", "1", "", "", "", "", "", "", "2", ""...
$ X7 <chr> "Shanghai, China", "", "", "", "", "", "", "", "", "", ""...
>