我在导入由空格/多个空格分隔的文本文件时遇到一些问题,这些空格还包含具有不应被解释为分隔符的空格的字符串的列!
该表没有列名,最多9列。第6列由一个带有空格的字符串构成。第4,7,8,9列是可选的,部分缺失。
我的想法是在阅读表格时使用固定的列宽,但在技术上无法实现这一点。
这是file-url:ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
因为read.table会抛出错误,
> read.table("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt",sep="")
Fehler in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Zeile 1 hatte keine 9 Elemente
我这样做了
lines <- readLines("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt")
以下是lines
的示例:
a <- c("USC00080211 29.7258 -85.0206 6.1 FL APALACHICOLA AIRPORT HCN 72220",
"USC00080228 27.2181 -81.8739 9.1 FL ARCADIA HCN ",
"USC00080236 27.1819 -81.3508 42.7 FL ARCHBOLD BIO STN ",
"USC00080369 27.5947 -81.5267 46.9 FL AVON PARK 2 W ",
"USC00080374 27.6000 -81.5000 46.0 FL AVON PARK 1 NW ",
"USC00080390 27.8500 -81.5167 38.1 FL BABSON PARK 1 ENE ",
"USC00080414 24.6589 -81.2761 0.9 FL BAHIA HONDA SP ",
"USC00080478 27.8986 -81.8433 38.1 FL BARTOW HCN ",
"ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD ",
"ACW00011647 17.1333 -61.7833 19.2 ST JOHNS ",
"AE000041196 25.3330 55.5170 34.0 SHARJAH INTER. AIRP GSN 41196"
)
tf <- tempfile(fileext=".txt")
writeLines(a,tf)
shell.exec(tf)
#read.table(tf, sep = "", ??)
答案 0 :(得分:1)
记录:
这是我找到的解决方案,感谢@bdecaf ..
lines <- readLines("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt")
w <- list(c(1,12), c(13,22), c(23,31), c(32,38), c(39,41), c(42,72), c(73,76), c(77,80), c(81,87))
ns <- c()
for(i in 1:length(w)) {
ns[i] <- paste("C", i, sep = ".")
assign(ns[i], str_trim(substring(lines, w[[i]][1], w[[i]][2])))
}
obj.list <- lapply(ns, get)
names(obj.list) <- ns
df <- data.frame(obj.list)