我有一个看起来像这样的数据集。
MX000003035 LORETO 26.0170 111.3330 7.0 1938 2014
1941 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
每个站的数据以站的描述开始 - 代码,名称,纬度等。第一列是年份,第二列是一年中的月份,第三列是天数和以下值是该月的每日降水值。
此单一数据集中有860个工作站。如何在R?
中将其转换为以下格式Station Code Name Lat Long Year Month Precip
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
MX000003035 LORETO 26.017 111.333 1941 1 0
..等等
编辑:这是数据集的链接 https://www.dropbox.com/s/o0yp1pe4rze8amd/gdcn_SWUS.txt
以下是一些片段......
1940 10 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1940 11 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1940 12 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
2014 9 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 10 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 11 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 12 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
MX000003068 CIUDAD CONSTITUCION 24.9500 -111.7000 48.0 1957 2014
1957 1 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1957 2 28-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1957 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 190 80 0 0 0 0 0 0 0 0 0 0 0 0
1957 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1957 5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1957 6 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1957 7 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1957 8 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 0 0 0 0 50 0 50 0 0 0 0 0 5 0 0 0
...
2014 9 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 10 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 11 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 12 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
USC00040983 BORREGO DESERT PARK 33.2314 -116.4144 245.4 1942 2014
1942 1 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942 2 28-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942 3 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942 4 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942 5 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942 6 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
答案 0 :(得分:0)
使用readLines
dat <- raedLines( filename )
然后获取带有电台名称和纬度/经度的行号:
stations <- dat[ grep( "[[:alpha:]]{2}", dat) ]
识别数据行的行号:
breaks <- grep( "[[:alpha:]]{2}", dat)
breaks
#[1] 1 6 10
制作休息序列:
breaks <- c(breaks, length(dat)+1 )
然后在中断之间拉入数据并让R“自动重复”功能复制站数据:
newdf <- lapply( seq_along(breaks[-1]),
function(idx){
data.frame( stations[idx],
read.table(text=dat[(breaks[idx]+1):(breaks[idx+1]-1)], fill=TRUE))})
然后将行重新绑定在一起:
newdf2 <- do.call(rbind, newdf)
测试数据:
dat <- readLines( textConnection("MX000003035 LORETO 26.0170 111.3330 7.0 1938 2014
1941 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MX000003036 Laredo 27.0170 112.3330 7.0 1938 2014
1941 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MX000003037 Another 28.0170 113.3330 7.0 1938 2014
1941 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1941 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0")
输出:(还没完成,但是如果你先将stations
传递给read.table,然后执行lapply(cbind(..))
操作,它应该可以正常工作:
## stations <- read.table(text=stations)
# Remove column 7 and add desired row names
> newdf ### the unfinished version
stations.idx. V1
1 MX000003035 LORETO 26.0170 111.3330 7.0 1938 2014 1941
2 MX000003035 LORETO 26.0170 111.3330 7.0 1938 2014 1941
3 MX000003035 LORETO 26.0170 111.3330 7.0 1938 2014 1941
4 MX000003035 LORETO 26.0170 111.3330 7.0 1938 2014 1941
5 MX000003036 Laredo 27.0170 112.3330 7.0 1938 2014 1941
6 MX000003036 Laredo 27.0170 112.3330 7.0 1938 2014 1941
7 MX000003036 Laredo 27.0170 112.3330 7.0 1938 2014 1941
8 MX000003036 Laredo 27.0170 112.3330 7.0 1938 2014 1941
9 MX000003037 Another 28.0170 113.3330 7.0 1938 2014 1941
10 MX000003037 Another 28.0170 113.3330 7.0 1938 2014 1941
11 MX000003037 Another 28.0170 113.3330 7.0 1938 2014 1941
12 MX000003037 Another 28.0170 113.3330 7.0 1938 2014 1941
V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23
1 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 1 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 2 28 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 3 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 4 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34
1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 NA NA NA
3 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 NA
5 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 NA NA NA
7 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 NA
9 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 NA NA NA
11 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 NA