我正在尝试导入此网站上的数据,但是它根本无法正常工作。这是一个简单的HTML表,因此应符合XML中的readHTMLTable
函数。请告知。
require(XML)
url = 'https://www.archives.gov/federal-register/electoral-college/allocation.html'
table = readHTMLTable(url,header = T,stringsAsFactors=F)
答案 0 :(得分:0)
您可以执行以下操作
$('form option.no:selected').length
经检查,我们发现主表是library(XML)
library(RCurl)
# Read HTML library
URL <- "https://www.archives.gov/federal-register/electoral-college/allocation.html"
lst <- readHTMLTable(getURL(URL))
# Remove NULL elements in lst
lst <- Filter(Negate(is.null), lst)
中的元素4
lst
您的方法不起作用的原因是,将df <- lst[[4]]
df
# State Number of Electoral Votes
#1 Alabama 9
#2 Alaska 3
#3 Arizona 11
#4 Arkansas 6
#5 California 55
#6 Colorado 9
#7 Connecticut 7
#8 Delaware 3
#9 District of Columbia 3
#10 Florida 29
#11 Georgia 16
#12 Hawaii 4
#13 Idaho 4
#14 Illinois 20
#15 Indiana 11
#16 Iowa 6
#17 Kansas 6
#18 Kentucky 8
#19 Louisiana 8
#20 Maine 4
#21 Maryland 10
#22 Massachusetts 11
#23 Michigan 16
#24 Minnesota 10
#25 Mississippi 6
#26 Missouri 10
#27 Montana 3
#28 Nebraska 5
#29 Nevada 6
#30 New Hampshire 4
#31 New Jersey 14
#32 New Mexico 5
#33 New York 29
#34 North Carolina 15
#35 North Dakota 3
#36 Ohio 18
#37 Oklahoma 7
#38 Oregon 7
#39 Pennsylvania 20
#40 Rhode Island 4
#41 South Carolina 9
#42 South Dakota 3
#43 Tennessee 11
#44 Texas 38
#45 Utah 6
#46 Vermont 3
#47 Virginia 13
#48 Washington 12
#49 West Virginia 5
#50 Wisconsin 10
#51 Wyoming 3
与URL一起使用时调用的url()
无法从https下载。因此,您需要先使用readHTMLTable
下载文件。
答案 1 :(得分:0)
这是使用rvest
软件包的解决方案
library(tidyverse)
library(rvest)
read_html("https://www.archives.gov/federal-register/electoral-college/allocation.html") %>% # read the html page
html_nodes("table") %>% # extract nodes which contain a table
.[5] %>% # select the node which contains the relevant table
html_table(trim = T) # extract the table