希望您能帮助我吗?我正在尝试从以下URL中提取“结果”表:https://www.moneysupermarket.com/credit-cards/search/results/?goal=CC_ALLCARDS
我一直在使用RVEST,引用此博客文章:https://www.r-bloggers.com/a-text-mining-function-for-websites/
当我修改了要在uSwitch上使用的代码时,此方法起作用了,但我认为MSM网站更加复杂。
这是我的uSwitch代码
## Load the libraries
library(tidyverse) # General purpose data wrangling
library(rvest) # Parsing of html/xml files
library(stringr) # String manipulation
library(rebus) # Verbose regular expressions
library(lubridate) # Eases datetime manipulation
################################################################################
## BUILD THE BT TABLE
## Define the page to scrape
bt.url <- 'https://www.uswitch.com/credit-cards/credit-card-balance-transfers/'
## Get the top ten brands
bt.brand <- read_html(bt.url) %>%
html_nodes(".us-ct-row__title--mobile strong") %>%
html_text()
bt.primary.offer <- read_html(bt.url) %>%
html_nodes(".us-ct-row__col--highlight") %>%
html_text()
# Get the offer details
bt.offer.details <- read_html(bt.url) %>%
html_nodes(".us-ct-row__key-details-col:nth-child(1)") %>%
html_text()
bt.clean.offer.details <- bt.offer.details %>%
str_replace("^Card details*", "")
## Get the cost to the customer
bt.cost.to.cust <- read_html(bt.url) %>%
html_nodes(".us-ct-row__name--fee span") %>%
html_text()
## Create a list of even numbers
even.seq <- seq(2, 20, 2)
## Extract even obs because the £ sign and the value are split into separate rows
bt.cost.to.cust <- bt.cost.to.cust[even.seq]
## Get the APR
bt.apr <- read_html(bt.url) %>%
html_nodes(".us-ct-row__col--highlight+ .us-ct-row__col--stretch .us-ct-row__name span") %>%
html_text()
## Get the offer duration
# .us-ct-row__col--highlight strong
bt.offer.duration <- read_html(bt.url) %>%
html_nodes(".us-ct-row__col--highlight strong") %>%
html_text()
## Stitch it all together
bt.table <- as.matrix(cbind(bt.brand, bt.primary.offer, bt.offer.duration, bt.cost.to.cust,
bt.apr, bt.clean.offer.details))
这一切都可以按我的意愿进行,如果可能的话,我只是想能够复制上面的网页?
失败了,我发现了一条建议在控制台的“网络”选项卡中检查DOC或XHR部分的帖子。
Scrape data from flash page using rvest
我这样做了,可以在控制台的“网络”>“ XHR”>“结果”>“预览”下看到结果表,但是我无法在R中将其拉回。这种方法最好,因为它可以提供比实际更多的信息呈现在页面上,但是经过大约一周的反复试验,我会采取任何措施!