无法从Web文本表创建R data.frame

时间:2013-03-05 04:48:15

标签: r read.table

有一个网站的表格为Australian weather stations,我希望将其加载到R data.frame中。前几行 - 不包括标题 - 就像这样

  23034 ADELAIDE AIRPORT                         -34.9524  138.5204 Apr 1995 Mar 2012   16.7   81  36.8   Y
  23046 ADELAIDE AIRPORT OLD SITE                -34.9566  138.5356 Aug 2002 Jan 2005    2.4   89  37.8   Y

它看起来像是一个制表符分隔的文件,但是当我保存为stations.txt并尝试read.delim,read.table或readLines时,我最终只能将所有内容放在一列中

我也尝试在Excel中复制和粘贴,但没有一个分隔选项正确分隔数据

2 个答案:

答案 0 :(得分:4)

# set filepaths & widths
fn <- "http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"
file.widths <- c( 7 , -1 , 39 , -1 , 9 , -1 , 9 , -1 , 8 , -1 , 8 , -1 , 6 , -1 , 4 , -1 , 5 , -1 , 3 )
# note: finding the file.widths is often the most
# annoying part of reading in an ASCII data set
# if you have a SAS import script,
# check ?parse.SAScii in the R SAScii package :)


# find the headers
headers <- 
    read.fwf( 
        fn ,
        widths = file.widths ,
        skip = 2 ,
        colClasses = "character" ,
        nrows = 1
    )

# remove spaces from column names
# and convert it to a character vector
cn <- gsub( " " , "" , headers[ 1 , ] )

# the % isn't a valid column name, so change that
cn[ 8 ] <- 'Pct'

# read everything in..
yourdata <-
    read.fwf( 
        fn ,
        widths = file.widths ,
        skip = 4 ,
        comment.char = "" ,
        nrows = 535 ,
        col.names = cn
    )

答案 1 :(得分:3)

旧式打孔卡格式化,....固定宽度。在utils:

中使用read.fwf函数
df2 <- read.fwf(textConnection("  23034 ADELAIDE AIRPORT                         -34.9524  138.5204 Apr 1995 Mar 2012   16.7   81  36.8   Y
   23046 ADELAIDE AIRPORT OLD SITE                -34.9566  138.5356 Aug 2002 Jan 2005    2.4   89  37.8   Y"), widths =c(49,9,9,9,9,7,7,6,2) )
df2
#-----------------------
                                                 V1       V2       V3        V4        V5   V6 V7   V8 V9
1   23034 ADELAIDE AIRPORT                          -34.9524 138.5204  Apr 1995  Mar 2012 16.7 81 36.8  Y
2   23046 ADELAIDE AIRPORT OLD SITE                 -34.9566 138.5356  Aug 2002  Jan 2005  2.4 89 37.8  Y

安东尼值得勾选。使用fwf输入更好地编码;这就是我要发布的内容:

df2 <- read.fwf(url("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_269.txt"), 
col.names= c('Site','Name', 'Lat', 'Lon','Start',    'End',       'Years',   'pct',    'Obs', 
'AWS'), widths=c(7,41,9,9,9,9,7,7,6,2) , row.names=NULL, skip=4, nrows=535, comment.char="")
str(df2)
#-----------------
'data.frame':   535 obs. of  10 variables:
 $ Site : int  23034 23046 23090 90180 9999 9741 68241 72160 15590 33295 ...
 $ Name : Factor w/ 533 levels " ADELAIDE (KENT TOWN)                    ",..: 2 3 1 4 5 6 7 8 9 10 ...
 $ Lat  : num  -35 -35 -34.9 -38.5 -34.9 ...
 $ Lon  : num  139 139 139 144 118 ...
 $ Start: Factor w/ 231 levels "0 Aug 200","0 Dec 199",..: 86 134 135 84 194 49 9 206 7 77 ...
 $ End  : Factor w/ 72 levels "0 Aug 201","0 Jul 200",..: 39 13 26 9 16 39 70 21 6 56 ...
 $ Years: Factor w/ 65 levels "0    0.","0    1.",..: 32 47 33 36 16 33 28 35 34 31 ...
 $ pct  : Factor w/ 173 levels "0   24 ","0   26 ",..: 116 66 59 108 51 41 153 15 139 27 ...
 $ Obs  : Factor w/ 262 levels "  1.0 ","  1.6 ",..: 145 151 223 242 230 216 141 130 138 81 ...
 $ AWS  : logi  NA NA NA NA NA NA ...