我尝试从文章标题和每个链接的简短摘要中提取here的链接。 输出应该有文章标题和每篇文章的简短摘要,它们在同一页上。
我能够获得链接。您能否建议我如何获得每个链接的标题和摘要。请参阅下面的代码。
install.packages('rvest')
#Loading the rvest package
library('rvest')
library(xml2)
#Specifying the url for desired website to be scrapped
url <- 'http://money.howstuffworks.com/business-profiles.htm'
webpage <- read_html(url)
pg <- read_html(url)
head(html_attr(html_nodes(pg, "a"), "href"))
答案 0 :(得分:2)
我们可以使用purrr
检查每个节点并提取相关信息:
library(rvest)
library(purrr)
url <- 'http://money.howstuffworks.com/business-profiles.htm'
articles <- read_html(url) %>%
html_nodes('.infinite-item > .media') %>%
map_df(~{
title <- .x %>%
html_node('.media-heading > h3') %>%
html_text()
head <- .x %>%
html_node('p') %>%
html_text()
link <- .x %>%
html_node('p > a') %>%
html_attr('href')
data.frame(title, head, link, stringsAsFactors = F)
})
head(articles)
#> title
#> 1 How Amazon Same-day Delivery Works
#> 2 10 Companies That Completely Reinvented Themselves
#> 3 10 Trade Secrets We Wish We Knew
#> 4 How Kickstarter Works
#> 5 Can you get rich selling stuff online?
#> 6 Are the Golden Arches really supposed to be giant french fries?
#> head
#> 1 The Amazon same-day delivery service aims to get your package to you in no time at all. Learn how Amazon same-day delivery works. See more »
#> 2 You might be surprised at what some of today's biggest companies used to do. Here are 10 companies that reinvented themselves from HowStuffWorks. See more »
#> 3 Trade secrets are often locked away in corporate vaults, making their owners a fortune. Which trade secrets are the stuff of legend? See more »
#> 4 Kickstarter is a service that utilizes crowdsourcing to raise funds for your projects. Learn about how Kickstarter works at HowStuffWorks. See more »
#> 5 Can you get rich selling your stuff online? Find out more in this article by HowStuffWorks.com. See more »
#> 6 Are McDonald's golden arches really suppose to be giant french fries? Check out this article for a brief history of McDonald's golden arches. See more »
#> link
#> 1 http://money.howstuffworks.com/amazon-same-day-delivery.htm
#> 2 http://money.howstuffworks.com/10-companies-reinvented-themselves.htm
#> 3 http://money.howstuffworks.com/10-trade-secrets.htm
#> 4 http://money.howstuffworks.com/kickstarter.htm
#> 5 http://money.howstuffworks.com/can-you-get-rich-selling-online.htm
#> 6 http://money.howstuffworks.com/mcdonalds-arches.htm
强制性评论:在这种情况下,我看到他们Terms and conditions没有收到关于收获的免责声明,但在抓取之前务必检查网站的条款。