Question

提前感谢您的帮助。

使用R，我试图从Politico的网站 - http://www.politico.com/2016-election/results/map/president/ohio/抓取县级别的2016年总统选举结果（我在俄亥俄州的网址b / c结尾处填写了＆＃39;我的测试用例）。

为此，我使用R的XML库中的两种方法 - 首先是htmlTreeParse，然后是xpathSApply。

我已经定位了我想要捕获的元素，但它只返回11个结果 - 那些是州和前10个县的总体结果。问题是俄亥俄州有88个县：

在检查html以确定第11行之后的差异后，我唯一可以识别的是每10行一个额外的标记（考虑到我的结果有意义）命名为＆＃39; data-z =＆＃34; 10＆＃34;，＆＃39; data-z =＆＃34; 20＆＃34;等：

我的代码如下所示 - 您应该能够插入并运行：

library(XML)
library(RCurl)

county_election_url <- "http://www.politico.com/2016-election/results/map/president/ohio"
county_parse <- htmlTreeParse(county_election_url, useInternalNodes = T)

xpathSApply(county_parse, "//div[@id = 'globalWrapper']//div[@class = 'super-duper']//article[@class = 'results-group']//table[@class = 'results-table']//tr[@class = 'type-republican']//td[@class = 'results-percentage']//span[@class = 'percentage-combo']//span[@class = 'number']",xmlValue)

使用XML库可以解决这个问题吗？还是需要建造火箭飞船？

使用xpathSapply在R中刮取HTML - 不返回所有行

0 个答案: