Webscraping错误:在R中找不到readHTMLTable的继承方法

时间:2018-01-18 19:46:17

标签: r web-scraping html-parsing

我正在处理别人的代码,而且我遇到了一个似乎无法通过的问题。当从一个站点为正在使用的一个数据集中提取数据时,它给出了下面显示的错误,它找不到readHTMLtable的“继承方法”。

public function contact_submit(){
$reader = new GeoIp2\Database\Reader(base_url().'assets/geoip2db/GeoLite2-Country.mmdb');
$record = $reader->country($_SERVER['REMOTE_ADDR']);
$data = array(
    'name' => $this->input->post('name'),
    'mail' => $this->input->post('mail'),
    'phone' => $this->input->post('phone'),
    'comment' => $this->input->post('comment'),
    'USER_AGENT' => $_SERVER['HTTP_USER_AGENT'],
    'ADDR' => $_SERVER['REMOTE_ADDR'],
    'PORT' => $_SERVER['REMOTE_PORT'],
    'ISO' => $record->country->isoCode,
    'COUNTRY_NAME' => $record->country->name,
    'COUNTRY_STATE' => $record->mostSpecificSubdivision->name,
    'COUNTRY_STATE_ISO' => $record->mostSpecificSubdivision->isoCode,
    'CITY_NAME' => $record->city->
    'POSTAL_CODE' => $record->postal->code,
    'LONGITUDE' => $record->location->longitude,
    'LATITUDE' => $record->location->latitude
);
if($this->MainModel->submitComment($data))
    redirect(base_url().'index.php/main/contact?statusMessage=1', 'location');
else
    redirect(base_url().'index.php/main/contact?statusMessage=0', 'location');
}

我的目标只是恢复数据集。我为另一个从另一个网站提取的数据集运行了另一段代码,并得到了我想用上面的那个实现的目标 -

url = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml"
page <- readLines(url)
Warning message:
In readLines(url) :
  incomplete final line found on 'http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml'
ONI_data_raw <- data.table (readHTMLTable(page, which=8))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’

非常感谢任何建议。

1 个答案:

答案 0 :(得分:2)

如果您查看从第一个网址获得的HTML,它会告诉您网站已移动,并为您提供新的网址。 (如果你在浏览器中查看它,它可能会自动重定向你。)使用它重定向到的URL和rvest用于抓取,

library(rvest)

h <- read_html('http://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php')

episodes <- h %>% 
    html_node('table[border="1"]') %>%    # get first table node with a border attribute of "1"
    html_table(header = TRUE, fill = TRUE) %>%    # parse the table
    dplyr::filter(Year != 'Year') %>%    # remove interior header rows
    readr::type_convert()    # convert types from character

str(episodes)
#> 'data.frame':    68 obs. of  13 variables:
#>  $ Year: num  1950 1951 1952 1953 1954 ...
#>  $ DJF : num  -1.5 -0.8 0.5 0.4 0.8 -0.7 -1.1 -0.2 1.8 0.6 ...
#>  $ JFM : num  -1.3 -0.5 0.4 0.6 0.5 -0.6 -0.8 0.1 1.7 0.6 ...
#>  $ FMA : num  -1.2 -0.2 0.3 0.6 0 -0.7 -0.6 0.4 1.3 0.5 ...
#>  $ MAM : num  -1.2 0.2 0.3 0.7 -0.4 -0.8 -0.5 0.7 0.9 0.3 ...
#>  $ AMJ : num  -1.1 0.4 0.2 0.8 -0.5 -0.8 -0.5 0.9 0.7 0.2 ...
#>  $ MJJ : num  -0.9 0.6 0 0.8 -0.5 -0.7 -0.5 1.1 0.6 -0.1 ...
#>  $ JJA : num  -0.5 0.7 -0.1 0.7 -0.6 -0.7 -0.6 1.3 0.6 -0.2 ...
#>  $ JAS : num  -0.4 0.9 0 0.7 -0.8 -0.7 -0.6 1.3 0.4 -0.3 ...
#>  $ ASO : num  -0.4 1 0.2 0.8 -0.9 -1.1 -0.5 1.3 0.4 -0.1 ...
#>  $ SON : num  -0.4 1.2 0.1 0.8 -0.8 -1.4 -0.4 1.4 0.4 0 ...
#>  $ OND : num  -0.6 1 0 0.8 -0.7 -1.7 -0.4 1.5 0.5 0 ...
#>  $ NDJ : num  -0.8 0.8 0.1 0.8 -0.7 -1.5 -0.4 1.7 0.6 0 ...