如何在模式遵循模式时创建模式表以读取数据

时间:2014-04-18 15:31:18

标签: r

我试图创建一个架构表,以便使用read.fwf读取一些数据。我们的想法是在loading space separated data on R?

中完成同样的事情

我的问题是架构要复杂得多,手工编写它似乎是一个坏主意。

这基本上就是我想写的:

Schema <- read.table(text = "ID            1-11   Character
                            YEAR         12-15   Integer
                            MONTH        16-17   Integer
                            ELEMENT      18-21   Character
                            VALUE1       22-26   Integer
                            MFLAG1       27-27   Character
                            QFLAG1       28-28   Character
                            SFLAG1       29-29   Character
                            VALUE2       30-34   Integer
                            MFLAG2       35-35   Character
                            QFLAG2       36-36   Character
                            SFLAG2       37-37   Character
                            VALUE3       38-42   Integer
                            MFLAG3       43-43   Character
                            QFLAG3       44-44   Character
                            SFLAG3       45-45   Character
                            VALUE4       46-50   Integer
                            MFLAG4       51-51   Character
                            QFLAG4       52-52   Character
                            SFLAG4       53-53   Character
                            .           .          .
                            .           .          .
                            .           .          .
                            VALUE31    262-266   Integer
                            MFLAG31    267-267   Character
                            QFLAG31    268-268   Character
                            SFLAG31    269-269   Character", 
                     header = FALSE, stringsAsFactors = FALSE)

我认为应该可以使用do循环和粘贴进行复制。例如,VALUE4 54-58整数等等。

我有正确的想法吗?有人可以告诉我如何实现它吗?

非常感谢!

1 个答案:

答案 0 :(得分:1)

幸运的是,这种模式的结构非常容易预测。你真的只需要对rep感到满意,就能把你需要的东西拼凑起来。

以下是我们需要的所有部分:

Widths <- c(11, 4, 2, 4, rep(c(5, 1, 1, 1), times = 31))

Names <- c("ID", "YEAR", "MONTH", "ELEMENT", 
           paste0(c("VALUE", "MFLAG", "QFLAG", "SFLAG"), 
                  rep(1:31, each = 4)))

Classes <- c("character", "integer", "character", "character",
             rep(c("integer", "character", "character", "character"), 
                 times = 31))

我建议使用&#34; Classes&#34;因为我们拥有它,因为这将有助于read.fwf更快地处理文件。

现在,让我们把这些作品用完:

out <- read.fwf(
    "ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/gsn/AQW00061705.dly",
    widths = Widths, header = FALSE, col.names = Names,  colClasses = Classes)

并且,让我们查看前几列和最后几列的结构,看它是否符合我们的预期:

str(out[c(1:10, 120:128)])
# data.frame':  10245 obs. of  19 variables:
#  $ ID     : chr  "AQW00061705" "AQW00061705" "AQW00061705" "AQW00061705" ...
#  $ YEAR   : int  1966 1966 1966 1966 1966 1966 1966 1966 1966 1966 ...
#  $ MONTH  : chr  "04" "04" "04" "04" ...
#  $ ELEMENT: chr  "TMAX" "TMIN" "PRCP" "SNOW" ...
#  $ VALUE1 : int  328 222 117 0 0 40 60 48 342 315 ...
#  $ MFLAG1 : chr  " " " " " " " " ...
#  $ QFLAG1 : chr  " " " " " " " " ...
#  $ SFLAG1 : chr  "0" "0" "0" "0" ...
#  $ VALUE2 : int  311 233 168 0 0 60 90 56 402 270 ...
#  $ MFLAG2 : chr  " " " " " " " " ...
#  $ SFLAG29: chr  "0" "0" "0" "0" ...
#  $ VALUE30: int  300 239 216 0 0 90 90 9 60 135 ...
#  $ MFLAG30: chr  " " " " " " " " ...
#  $ QFLAG30: chr  " " " " " " " " ...
#  $ SFLAG30: chr  "0" "0" "0" "0" ...
#  $ VALUE31: int  -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
#  $ MFLAG31: chr  " " " " " " " " ...
#  $ QFLAG31: chr  " " " " " " " " ...
#  $ SFLAG31: chr  " " " " " " " " ...