R readHTMLTable函数不起作用

时间:2016-04-14 16:29:51

标签: r xml-parsing web-scraping

我在R中编写了以下代码,其中我想从this particular webpage获取一些名称。

library(RCurl)
library(XML)
x <- getURL("http://www.encyclopedia-titanica.org/titanic-passengers-crew-lived/country-17/england.html")
x_2 <- htmlParse(x)
x_3 <- readHTMLTable(x_2) 

但是,每当我查看x_3的内容时,我都会得到以下内容......

x_3
named list()

似乎readHTMLTable函数无法获取表。任何人都可以帮助我从这个网页获取乘客的名字,而无需复制和粘贴?非常感激。

1 个答案:

答案 0 :(得分:0)

library(rvest)
library(dplyr)

base <- "http://www.encyclopedia-titanica.org/titanic-passengers-crew-lived/country-17/england.html"

# I use the older rvest package...`html` might be `read_html` now.Link to git repo below:
# https://github.com/hadley/rvest/blob/7d65d84e013b1bb3827ae0a2e05ddaed4875c112/R/parse.R
data_df <- (html(base) %>% html_table)[[1]]

knitr::kable(summary(data_df))

    |   |    Name         |    Age          | Class/Dept      |   Ticket        |   Joined        |    Job          |Boat [Body]      |             |
    |:--|:----------------|:----------------|:----------------|:----------------|:----------------|:----------------|:----------------|:------------|
    |   |Length:1190      |Length:1190      |Length:1190      |Length:1190      |Length:1190      |Length:1190      |Length:1190      |Mode:logical |
    |   |Class :character |Class :character |Class :character |Class :character |Class :character |Class :character |Class :character |NA's:1190    |
    |   |Mode  :character |Mode  :character |Mode  :character |Mode  :character |Mode  :character |Mode  :character |Mode  :character |NA           |