有一个网站的表格为Australian weather stations,我希望将其加载到R data.frame中。前几行 - 不包括标题 - 就像这样
23034 ADELAIDE AIRPORT -34.9524 138.5204 Apr 1995 Mar 2012 16.7 81 36.8 Y
23046 ADELAIDE AIRPORT OLD SITE -34.9566 138.5356 Aug 2002 Jan 2005 2.4 89 37.8 Y
它看起来像是一个制表符分隔的文件,但是当我保存为stations.txt并尝试read.delim,read.table或readLines时,我最终只能将所有内容放在一列中
我也尝试在Excel中复制和粘贴,但没有一个分隔选项正确分隔数据
答案 0 :(得分:4)
# set filepaths & widths
fn <- "http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"
file.widths <- c( 7 , -1 , 39 , -1 , 9 , -1 , 9 , -1 , 8 , -1 , 8 , -1 , 6 , -1 , 4 , -1 , 5 , -1 , 3 )
# note: finding the file.widths is often the most
# annoying part of reading in an ASCII data set
# if you have a SAS import script,
# check ?parse.SAScii in the R SAScii package :)
# find the headers
headers <-
read.fwf(
fn ,
widths = file.widths ,
skip = 2 ,
colClasses = "character" ,
nrows = 1
)
# remove spaces from column names
# and convert it to a character vector
cn <- gsub( " " , "" , headers[ 1 , ] )
# the % isn't a valid column name, so change that
cn[ 8 ] <- 'Pct'
# read everything in..
yourdata <-
read.fwf(
fn ,
widths = file.widths ,
skip = 4 ,
comment.char = "" ,
nrows = 535 ,
col.names = cn
)
答案 1 :(得分:3)
旧式打孔卡格式化,....固定宽度。在utils:
中使用read.fwf函数df2 <- read.fwf(textConnection(" 23034 ADELAIDE AIRPORT -34.9524 138.5204 Apr 1995 Mar 2012 16.7 81 36.8 Y
23046 ADELAIDE AIRPORT OLD SITE -34.9566 138.5356 Aug 2002 Jan 2005 2.4 89 37.8 Y"), widths =c(49,9,9,9,9,7,7,6,2) )
df2
#-----------------------
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 23034 ADELAIDE AIRPORT -34.9524 138.5204 Apr 1995 Mar 2012 16.7 81 36.8 Y
2 23046 ADELAIDE AIRPORT OLD SITE -34.9566 138.5356 Aug 2002 Jan 2005 2.4 89 37.8 Y
安东尼值得勾选。使用fwf输入更好地编码;这就是我要发布的内容:
df2 <- read.fwf(url("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"),
col.names= c('Site','Name', 'Lat', 'Lon','Start', 'End', 'Years', 'pct', 'Obs',
'AWS'), widths=c(7,41,9,9,9,9,7,7,6,2) , row.names=NULL, skip=4, nrows=535, comment.char="")
str(df2)
#-----------------
'data.frame': 535 obs. of 10 variables:
$ Site : int 23034 23046 23090 90180 9999 9741 68241 72160 15590 33295 ...
$ Name : Factor w/ 533 levels " ADELAIDE (KENT TOWN) ",..: 2 3 1 4 5 6 7 8 9 10 ...
$ Lat : num -35 -35 -34.9 -38.5 -34.9 ...
$ Lon : num 139 139 139 144 118 ...
$ Start: Factor w/ 231 levels "0 Aug 200","0 Dec 199",..: 86 134 135 84 194 49 9 206 7 77 ...
$ End : Factor w/ 72 levels "0 Aug 201","0 Jul 200",..: 39 13 26 9 16 39 70 21 6 56 ...
$ Years: Factor w/ 65 levels "0 0.","0 1.",..: 32 47 33 36 16 33 28 35 34 31 ...
$ pct : Factor w/ 173 levels "0 24 ","0 26 ",..: 116 66 59 108 51 41 153 15 139 27 ...
$ Obs : Factor w/ 262 levels " 1.0 "," 1.6 ",..: 145 151 223 242 230 216 141 130 138 81 ...
$ AWS : logi NA NA NA NA NA NA ...