Question

我正在尝试导入此网站上的数据，但是它根本无法正常工作。这是一个简单的HTML表，因此应符合XML中的readHTMLTable函数。请告知。

require(XML)
url = 'https://www.archives.gov/federal-register/electoral-college/allocation.html'
table = readHTMLTable(url,header = T,stringsAsFactors=F)

Answer 1

您可以执行以下操作

$('form option.no:selected').length

经检查，我们发现主表是library(XML) library(RCurl) # Read HTML library URL <- "https://www.archives.gov/federal-register/electoral-college/allocation.html" lst <- readHTMLTable(getURL(URL)) # Remove NULL elements in lst lst <- Filter(Negate(is.null), lst)中的元素4

lst

您的方法不起作用的原因是，将df <- lst[[4]] df # State Number of Electoral Votes #1 Alabama 9 #2 Alaska 3 #3 Arizona 11 #4 Arkansas 6 #5 California 55 #6 Colorado 9 #7 Connecticut 7 #8 Delaware 3 #9 District of Columbia 3 #10 Florida 29 #11 Georgia 16 #12 Hawaii 4 #13 Idaho 4 #14 Illinois 20 #15 Indiana 11 #16 Iowa 6 #17 Kansas 6 #18 Kentucky 8 #19 Louisiana 8 #20 Maine 4 #21 Maryland 10 #22 Massachusetts 11 #23 Michigan 16 #24 Minnesota 10 #25 Mississippi 6 #26 Missouri 10 #27 Montana 3 #28 Nebraska 5 #29 Nevada 6 #30 New Hampshire 4 #31 New Jersey 14 #32 New Mexico 5 #33 New York 29 #34 North Carolina 15 #35 North Dakota 3 #36 Ohio 18 #37 Oklahoma 7 #38 Oregon 7 #39 Pennsylvania 20 #40 Rhode Island 4 #41 South Carolina 9 #42 South Dakota 3 #43 Tennessee 11 #44 Texas 38 #45 Utah 6 #46 Vermont 3 #47 Virginia 13 #48 Washington 12 #49 West Virginia 5 #50 Wisconsin 10 #51 Wyoming 3与URL一起使用时调用的url()无法从https下载。因此，您需要先使用readHTMLTable下载文件。

Answer 2

这是使用rvest软件包的解决方案

library(tidyverse)
library(rvest)

read_html("https://www.archives.gov/federal-register/electoral-college/allocation.html") %>% # read the html page
  html_nodes("table") %>% # extract nodes which contain a table
  .[5] %>% # select the node which contains the relevant table
  html_table(trim = T) # extract the table

R：readHTMLTable返回空列表

2 个答案: