我试图创建一个架构表,以便使用read.fwf读取一些数据。我们的想法是在loading space separated data on R?
中完成同样的事情我的问题是架构要复杂得多,手工编写它似乎是一个坏主意。
这基本上就是我想写的:
Schema <- read.table(text = "ID 1-11 Character
YEAR 12-15 Integer
MONTH 16-17 Integer
ELEMENT 18-21 Character
VALUE1 22-26 Integer
MFLAG1 27-27 Character
QFLAG1 28-28 Character
SFLAG1 29-29 Character
VALUE2 30-34 Integer
MFLAG2 35-35 Character
QFLAG2 36-36 Character
SFLAG2 37-37 Character
VALUE3 38-42 Integer
MFLAG3 43-43 Character
QFLAG3 44-44 Character
SFLAG3 45-45 Character
VALUE4 46-50 Integer
MFLAG4 51-51 Character
QFLAG4 52-52 Character
SFLAG4 53-53 Character
. . .
. . .
. . .
VALUE31 262-266 Integer
MFLAG31 267-267 Character
QFLAG31 268-268 Character
SFLAG31 269-269 Character",
header = FALSE, stringsAsFactors = FALSE)
我认为应该可以使用do循环和粘贴进行复制。例如,VALUE4 54-58整数等等。
我有正确的想法吗?有人可以告诉我如何实现它吗?
非常感谢!
答案 0 :(得分:1)
幸运的是,这种模式的结构非常容易预测。你真的只需要对rep
感到满意,就能把你需要的东西拼凑起来。
以下是我们需要的所有部分:
Widths <- c(11, 4, 2, 4, rep(c(5, 1, 1, 1), times = 31))
Names <- c("ID", "YEAR", "MONTH", "ELEMENT",
paste0(c("VALUE", "MFLAG", "QFLAG", "SFLAG"),
rep(1:31, each = 4)))
Classes <- c("character", "integer", "character", "character",
rep(c("integer", "character", "character", "character"),
times = 31))
我建议使用&#34; Classes&#34;因为我们拥有它,因为这将有助于read.fwf
更快地处理文件。
现在,让我们把这些作品用完:
out <- read.fwf(
"ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/gsn/AQW00061705.dly",
widths = Widths, header = FALSE, col.names = Names, colClasses = Classes)
并且,让我们查看前几列和最后几列的结构,看它是否符合我们的预期:
str(out[c(1:10, 120:128)])
# data.frame': 10245 obs. of 19 variables:
# $ ID : chr "AQW00061705" "AQW00061705" "AQW00061705" "AQW00061705" ...
# $ YEAR : int 1966 1966 1966 1966 1966 1966 1966 1966 1966 1966 ...
# $ MONTH : chr "04" "04" "04" "04" ...
# $ ELEMENT: chr "TMAX" "TMIN" "PRCP" "SNOW" ...
# $ VALUE1 : int 328 222 117 0 0 40 60 48 342 315 ...
# $ MFLAG1 : chr " " " " " " " " ...
# $ QFLAG1 : chr " " " " " " " " ...
# $ SFLAG1 : chr "0" "0" "0" "0" ...
# $ VALUE2 : int 311 233 168 0 0 60 90 56 402 270 ...
# $ MFLAG2 : chr " " " " " " " " ...
# $ SFLAG29: chr "0" "0" "0" "0" ...
# $ VALUE30: int 300 239 216 0 0 90 90 9 60 135 ...
# $ MFLAG30: chr " " " " " " " " ...
# $ QFLAG30: chr " " " " " " " " ...
# $ SFLAG30: chr "0" "0" "0" "0" ...
# $ VALUE31: int -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
# $ MFLAG31: chr " " " " " " " " ...
# $ QFLAG31: chr " " " " " " " " ...
# $ SFLAG31: chr " " " " " " " " ...