将不等长度的行转换为一列

时间:2014-03-17 21:17:15

标签: r

我有一个看起来像这样的数据集。

MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

每个站的数据以站的描述开始 - 代码,名称,纬度等。第一列是年份,第二列是一年中的月份,第三列是天数和以下值是该月的每日降水值。

此单一数据集中有860个工作站。如何在R?

中将其转换为以下格式
Station Code    Name    Lat Long    Year    Month   Precip
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0

..等等

编辑:这是数据集的链接 https://www.dropbox.com/s/o0yp1pe4rze8amd/gdcn_SWUS.txt

以下是一些片段......

1940 10 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1940 11 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1940 12 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

...

2014  9 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 10 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 11 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 12 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
MX000003068 CIUDAD CONSTITUCION       24.9500 -111.7000   48.0 1957 2014
1957  1 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1957  2 28-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1957  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0  190   80    0    0    0    0    0    0    0    0    0    0    0    0
1957  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  5 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  6 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  7 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  8 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0   50    0    0    0    0   50    0   50    0    0    0    0    0    5    0    0    0

...

2014  9 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 10 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 11 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 12 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
USC00040983 BORREGO DESERT PARK       33.2314 -116.4144  245.4 1942 2014
1942  1 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  2 28-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  3 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  4 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  5 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  6 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

1 个答案:

答案 0 :(得分:0)

使用readLines

将数据导入R.
dat <- raedLines( filename )

然后获取带有电台名称和纬度/经度的行号:

stations <- dat[ grep( "[[:alpha:]]{2}", dat) ] 

识别数据行的行号:

breaks <- grep( "[[:alpha:]]{2}", dat)
 breaks
#[1]  1  6 10

制作休息序列:

breaks <- c(breaks, length(dat)+1 )

然后在中断之间拉入数据并让R“自动重复”功能复制站数据:

newdf <- lapply( seq_along(breaks[-1]), 
             function(idx){ 
        data.frame( stations[idx],
                    read.table(text=dat[(breaks[idx]+1):(breaks[idx+1]-1)], fill=TRUE))})

然后将行重新绑定在一起:

  newdf2 <- do.call(rbind, newdf)

测试数据:

dat <- readLines( textConnection("MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
MX000003037 Another                    28.0170  113.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0")

输出:(还没完成,但是如果你先将stations传递给read.table,然后执行lapply(cbind(..))操作,它应该可以正常工作:

 ##   stations <- read.table(text=stations)
 # Remove column 7 and add desired row names

> newdf               ### the unfinished version
                                                               stations.idx.   V1
1   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
2   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
3   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
4   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
5   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
6   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
7   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
8   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
9  MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
10 MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
11 MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
12 MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
   V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23
1   1 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
2   2 28  0  0  0 10  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3   3 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4   4 30  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
5   1 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
6   2 28  0  0  0 10  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
7   3 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
8   4 30  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
9   1 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
10  2 28  0  0  0 10  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
11  3 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
12  4 30  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34
1    0   0   0   0   0   0   0   0   0   0   0
2    0   0   0   0   0   0   0   0  NA  NA  NA
3    0   0   0   0   0   0   0   0   0   0   0
4    0   0   0   0   0   0   0   0   0   0  NA
5    0   0   0   0   0   0   0   0   0   0   0
6    0   0   0   0   0   0   0   0  NA  NA  NA
7    0   0   0   0   0   0   0   0   0   0   0
8    0   0   0   0   0   0   0   0   0   0  NA
9    0   0   0   0   0   0   0   0   0   0   0
10   0   0   0   0   0   0   0   0  NA  NA  NA
11   0   0   0   0   0   0   0   0   0   0   0
12   0   0   0   0   0   0   0   0   0   0  NA