Question

希望您能帮助我吗？我正在尝试从以下URL中提取“结果”表：https://www.moneysupermarket.com/credit-cards/search/results/?goal=CC_ALLCARDS

我一直在使用RVEST，引用此博客文章：https://www.r-bloggers.com/a-text-mining-function-for-websites/

当我修改了要在uSwitch上使用的代码时，此方法起作用了，但我认为MSM网站更加复杂。

这是我的uSwitch代码

## Load the libraries
library(tidyverse)     # General purpose data wrangling
library(rvest)         # Parsing of html/xml files
library(stringr)       # String manipulation
library(rebus)         # Verbose regular expressions
library(lubridate)     # Eases datetime manipulation

################################################################################

##  BUILD THE BT TABLE

## Define the page to scrape
bt.url <- 'https://www.uswitch.com/credit-cards/credit-card-balance-transfers/'

## Get the top ten brands
bt.brand <- read_html(bt.url) %>%
     html_nodes(".us-ct-row__title--mobile strong") %>%
     html_text()

bt.primary.offer <- read_html(bt.url) %>%
     html_nodes(".us-ct-row__col--highlight") %>%
     html_text()

# Get the offer details
bt.offer.details <- read_html(bt.url) %>%
     html_nodes(".us-ct-row__key-details-col:nth-child(1)") %>%
     html_text()

bt.clean.offer.details <- bt.offer.details %>% 
     str_replace("^Card details*", "")

## Get the cost to the customer
bt.cost.to.cust <- read_html(bt.url) %>%
     html_nodes(".us-ct-row__name--fee span") %>%
     html_text()
## Create a list of even numbers
even.seq <- seq(2, 20, 2)
## Extract even obs because the £ sign and the value are split into separate rows
bt.cost.to.cust <- bt.cost.to.cust[even.seq]

## Get the APR
bt.apr <- read_html(bt.url) %>%
     html_nodes(".us-ct-row__col--highlight+ .us-ct-row__col--stretch .us-ct-row__name span") %>%
     html_text()

## Get the offer duration
# .us-ct-row__col--highlight strong
bt.offer.duration <- read_html(bt.url) %>%
     html_nodes(".us-ct-row__col--highlight strong") %>%
     html_text()

## Stitch it all together
bt.table <- as.matrix(cbind(bt.brand, bt.primary.offer, bt.offer.duration, bt.cost.to.cust, 
                            bt.apr, bt.clean.offer.details))

这一切都可以按我的意愿进行，如果可能的话，我只是想能够复制上面的网页？

失败了，我发现了一条建议在控制台的“网络”选项卡中检查DOC或XHR部分的帖子。

Scrape data from flash page using rvest

我这样做了，可以在控制台的“网络”>“ XHR”>“结果”>“预览”下看到结果表，但是我无法在R中将其拉回。这种方法最好，因为它可以提供比实际更多的信息呈现在页面上，但是经过大约一周的反复试验，我会采取任何措施！

使用RVest从网页抓取信息时出现问题

0 个答案: