R:readHTMLTable返回空列表

时间:2019-03-04 22:39:24

标签: r xml

我正在尝试导入此网站上的数据,但是它根本无法正常工作。这是一个简单的HTML表,因此应符合XML中的readHTMLTable函数。请告知。

require(XML)
url = 'https://www.archives.gov/federal-register/electoral-college/allocation.html'
table = readHTMLTable(url,header = T,stringsAsFactors=F)

2 个答案:

答案 0 :(得分:0)

您可以执行以下操作

$('form option.no:selected').length

经检查,我们发现主表是library(XML) library(RCurl) # Read HTML library URL <- "https://www.archives.gov/federal-register/electoral-college/allocation.html" lst <- readHTMLTable(getURL(URL)) # Remove NULL elements in lst lst <- Filter(Negate(is.null), lst) 中的元素4

lst

您的方法不起作用的原因是,将df <- lst[[4]] df # State Number of Electoral Votes #1 Alabama 9 #2 Alaska 3 #3 Arizona 11 #4 Arkansas 6 #5 California 55 #6 Colorado 9 #7 Connecticut 7 #8 Delaware 3 #9 District of Columbia 3 #10 Florida 29 #11 Georgia 16 #12 Hawaii 4 #13 Idaho 4 #14 Illinois 20 #15 Indiana 11 #16 Iowa 6 #17 Kansas 6 #18 Kentucky 8 #19 Louisiana 8 #20 Maine 4 #21 Maryland 10 #22 Massachusetts 11 #23 Michigan 16 #24 Minnesota 10 #25 Mississippi 6 #26 Missouri 10 #27 Montana 3 #28 Nebraska 5 #29 Nevada 6 #30 New Hampshire 4 #31 New Jersey 14 #32 New Mexico 5 #33 New York 29 #34 North Carolina 15 #35 North Dakota 3 #36 Ohio 18 #37 Oklahoma 7 #38 Oregon 7 #39 Pennsylvania 20 #40 Rhode Island 4 #41 South Carolina 9 #42 South Dakota 3 #43 Tennessee 11 #44 Texas 38 #45 Utah 6 #46 Vermont 3 #47 Virginia 13 #48 Washington 12 #49 West Virginia 5 #50 Wisconsin 10 #51 Wyoming 3 与URL一起使用时调用的url()无法从https下载。因此,您需要先使用readHTMLTable下载文件。

答案 1 :(得分:0)

这是使用rvest软件包的解决方案

library(tidyverse)
library(rvest)

read_html("https://www.archives.gov/federal-register/electoral-college/allocation.html") %>% # read the html page
  html_nodes("table") %>% # extract nodes which contain a table
  .[5] %>% # select the node which contains the relevant table
  html_table(trim = T) # extract the table