我正在尝试使用R下载http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2016&MONTH=08&FROM=2612&TO=2612&STNM=71203处的数据。根据我的理解,这是一个列表。我试图使用XML包,但继续得到错误'错误(函数(classes,fdef,mtable):无法找到函数'readHTMLList'的继承方法,用于签名'“NULL”''。我使用readHTMLTable()时也会出现同样的错误。这就是我一直在使用该函数的方法:
url = "http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2016&MONTH=08&FROM=2612&TO=2612&STNM=71203"
mydata = read.HTMLTable(url, which = 11, trim = T)
我还尝试在功能选项中加入header = T
,stringsAsFactors = F
和readLines(url)
无效。如果我只需要其中一个表,我会手动下载它,但我需要大量的这些数据。我的想法是循环通过URL中的FROM =和TO =,一旦我获得初始功能,就可以访问探测数据的不同日期和时间。任何帮助都会很棒。
答案 0 :(得分:5)
值得庆幸的是,这是一个包含在<pre>
标记中的纯文本表,因此我们可以在HTML中读取,从<pre>
标记中提取文本,然后将其读入表中,同时提供正确的列名和类型:
library(rvest)
library(readr)
URL <- "http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2016&MONTH=08&FROM=2612&TO=2612&STNM=71203"
pg <-read_html(URL)
html_nodes(pg, "pre")[[1]] %>%
html_text() -> dat
read_table(dat, skip=5, col_types="ddddddddddd",
col_names=c("pres", "hght", "temp", "dwpt", "relh", "mixr",
"drct", "sknt", "thta", "thte", "thtv")) -> df
dplyr::glimpse(df)
## Variables: 11
## $ pres <dbl> 1000.0, 963.0, 962.0, 955.0, 945.8, 944.0, 925.0, 912.8, 891.0, 880.8, 877.0, 850.0, 819.1...
## $ hght <dbl> 130, 456, 465, 527, 610, 626, 800, 914, 1121, 1219, 1256, 1522, 1829, 2134, 2438, 2743, 31...
## $ temp <dbl> NA, 13.2, 15.2, 18.4, 18.9, 19.0, 18.8, 18.2, 17.2, 17.4, 17.4, 15.0, 12.4, 9.8, 7.2, 4.7,...
## $ dwpt <dbl> NA, 8.8, 9.2, 8.4, 7.2, 7.0, 6.8, 6.2, 5.2, 5.3, 5.4, 4.0, 2.8, 1.6, 0.4, -0.9, -2.3, -2.5...
## $ relh <dbl> NA, 75, 67, 52, 47, 46, 46, 45, 45, 45, 45, 48, 52, 56, 62, 67, 75, 75, 73, 70, 68, 23, 17...
## $ mixr <dbl> NA, 7.43, 7.64, 7.29, 6.79, 6.70, 6.74, 6.57, 6.26, 6.40, 6.45, 6.03, 5.74, 5.46, 5.19, 4....
## $ drct <dbl> NA, 240, 247, 295, 0, 15, 175, 170, 72, 25, 22, 0, 335, 300, 290, 300, 300, 300, 300, 319,...
## $ sknt <dbl> NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 3, 5, 5, 7, 6, 6, 6, 10, 10, 21, 21, 21, 21, 21, 21, ...
## $ thta <dbl> NA, 289.4, 291.6, 295.4, 296.7, 297.0, 298.5, 299.1, 300.1, 301.2, 301.6, 301.9, 302.3, 30...
## $ thte <dbl> NA, 310.7, 313.6, 316.8, 316.9, 316.9, 318.6, 318.8, 319.0, 320.6, 321.2, 320.2, 319.8, 31...
## $ thtv <dbl> NA, 290.8, 292.9, 296.7, 298.0, 298.2, 299.7, 300.3, 301.2, 302.4, 302.8, 302.9, 303.4, 30...
答案 1 :(得分:2)
使用rvest和readr软件包:
> txt = read_html(url) %>% html_node("pre") %>% html_text()
从<pre>
标记内获取文本。然后:
> data = txt %>% read_fwf(fwf_empty(.,skip=5),skip=5)
制作一个数据框:
> head(data)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1 1000.0 130 NA NA NA NA NA NA NA NA NA
2 963.0 456 13.2 8.8 75 7.43 240 1 289.4 310.7 290.8
3 962.0 465 15.2 9.2 67 7.64 247 1 291.6 313.6 292.9
4 955.0 527 18.4 8.4 52 7.29 295 1 295.4 316.8 296.7
5 945.8 610 18.9 7.2 47 6.79 0 1 296.7 316.9 298.0
6 944.0 626 19.0 7.0 46 6.70 15 1 297.0 316.9 298.2
获取名称留给读者练习......