============
Table
============
Pagination: Link1, Link2, Link3, Link4, LinkNext,Link Last
使用选择器小工具我确定分页位于“.pagination-container,a”
我想
关注http://francojc.github.io/web-scraping-with-rvest/
我从
开始library(tidyverse)
library(rvest)
url <- "https://aplikacje.nfz.gov.pl/umowy/Provider/Index?ROK=2017&OW=07&ServiceType=03&Code=&Name=&City=&Nip=&Regon=&Product=&OrthopedicSupply=false"
urls <- url %>% # feed `main.page` to the next step
html_nodes(".pagination-container, a") %>% # get the CSS nodes
html_text("href")
在html_nodes上会抛出错误
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "character"
我做错了什么?
答案 0 :(得分:4)
超越&#34;拼写错误&#34; (即错过了对read_html()
的号召),这是获得总页数的更简单方法。只需定位参与者中的[>>]
链接:
library(rvest)
library(stringi)
library(tidyverse)
url <- "https://aplikacje.nfz.gov.pl/umowy/Provider/Index?ROK=2017&OW=07&ServiceType=03&Code=&Name=&City=&Nip=&Regon=&Product=&OrthopedicSupply=false"
pg <- read_html(url)
html_nodes(pg, "li.PagedList-skipToLast > a") %>%
html_attr("href") %>%
stri_match_last_regex("page=([[:digit:]]+)") %>%
.[,2]
## [1] "13"