我有一个采用以下形式的数据集:
# prate [mm/day] from 4x Daily NOAA-CIRES 20th Century Reanalysis V2c
# grid point lon,lat = 22.500 63.808
1851 1.8 0.9 1.7 1.5 1.6 2.7 2.7 2.6 1.3 2.5 1.8 1.7
1852 2.2 1.6 0.9 1.4 1.6 2.5 2.4 2.0 1.8 2.3 1.9 1.5
...
我希望从此文本文件顶部的注释中提取经度和纬度,并将其重复添加为此数据集中的两个附加列。因此我的输出应如此:
# prate [mm/day] from 4x Daily NOAA-CIRES 20th Century Reanalysis V2c
# grid point lon,lat = 22.500 63.808
1851 1.8 0.9 1.7 1.5 1.6 2.7 2.7 2.6 1.3 2.5 1.8 1.7 22.500 63.808
1852 2.2 1.6 0.9 1.4 1.6 2.5 2.4 2.0 1.8 2.3 1.9 1.5 22.500 63.808
...
有没有人对如何实现这个有任何想法?
答案 0 :(得分:2)
使用:
dat <- read.table('dataset.txt', header = FALSE, skip = 2)
txt <- readLines('dataset.txt', n = 2)
llcols <- read.table(text = trimws(gsub('.*=','',txt[2])), header = FALSE)
names(llcols) <- c('lon','lat')
dat <- cbind(dat, llcols)
给出:
> dat
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 lon lat
1 1851 1.8 0.9 1.7 1.5 1.6 2.7 2.7 2.6 1.3 2.5 1.8 1.7 22.5 63.808
2 1852 2.2 1.6 0.9 1.4 1.6 2.5 2.4 2.0 1.8 2.3 1.9 1.5 22.5 63.808
说明:
dat <- read.table('dataset.txt', header = FALSE, skip = 2)
,您可以阅读数据并排除两个注释行。txt <- readLines('dataset.txt', n = 2)
,您将两条评论行读作文字。trimws(gsub('.*=','',txt[2]))
提取lon / lat值。read.table
读取结果。cbind
,您将两个数据帧合并为一个。 lon / lat值将重复到dat
结束。读取一堆文件可以按如下方式完成:
filenames <- list.files(pattern = '.txt')
dflist <- lapply(filenames, function(x) {
dat <- read.table(x, header = FALSE, skip = 2)
txt <- readLines(x, n = 2)
llcols <- read.table(text = trimws(gsub('.*=','',txt[2])), header = FALSE)
names(llcols) <- c('lon','lat')
cbind(dat,llcols)
})
答案 1 :(得分:1)
我现在已经找到了将其应用于列表的解决方案,如下所示:
dat=apply(data.frame(list.files()), 1, read.table, header=F, skip=2)
txt=apply(data.frame(list.files()), 1, readLines, n=2)
llcols=lapply(txt, function(x) read.table(text =trimws(gsub('.*=','',txt[2])), header = FALSE))
names(LLCOLS)=c('lon','lat')
dat=lapply(dat, function(x) cbind(x, llcols))