我有一个问题。我正在尝试从非HTML网站中删除这两个表。 这是网站:
但是,我正在遵循一些我不应该做的事情,但是没有找到任何答案。这是我尝试过的:
library(tidyverse)
library(rvest)
library(XML)
library(httr)
url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"
poptable <- readHTMLTable(url, which = 1)
并得到此错误:
错误(函数(类,fdef,mtable)):无法找到 函数“ readHTMLTable”的继承方法,用于签名““ NULL”” 另外:警告消息:XML内容似乎不是XML: 'https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp'
我认为无论ASP网站类型如何,我仍然可以使用readHTMLTable函数。有没有其他选择。我还没有找到任何东西,并且奋斗了几个小时才找到东西。
答案 0 :(得分:3)
实际上,这很简单(基于@lukeA's answer):
library(rvest)
url <- "https://www.datadictionary.nhs.uk/web_site_content/supporting_information/main_specialty_and_treatment_function_codes_table.asp"
page <- read_html(url)
nodes <- html_nodes(page, "table") # you can use Selectorgadget to identify the node
table <- html_table(nodes[[1]]) # each element of the nodes list is one table that can be extracted
head(table)
Code Main Specialty Title
1 Surgical Specialties Surgical Specialties Surgical Specialties
2 100 GENERAL SURGERY
3 101 UROLOGY
4 110 TRAUMA & ORTHOPAEDICS
5 120 ENT
6 130 OPHTHALMOLOGY
Selectorgadget可以在这里安装:Selectorgadget by Hadley Wickham