Web抓取亚马逊图书名称

时间:2019-09-12 00:36:31

标签: r web-scraping

我正试图通过网络抓取亚马逊的图书名称:

anaconda

通过CSS SELECTOR:

rm(list = ls())

library(rvest)
library(XML)
library(xml2)
url_amazon <- 'https://www.amazon.com/s/browse?_encoding=UTF8&node=283155&ref_=nav_shopall-export_nav_mw_sbd_intl_books'

web_page<-read_html(url_amazon)

通过XPATH选择器:

rank_titles<-html_text(html_nodes(web_page,".a-link-normal .a-size-base"))

但是书名不正确。为什么?我在做什么错了?

有帮助吗?

1 个答案:

答案 0 :(得分:0)

这将获取所有表格。

library(XML)
library(RCurl)

url <- "https://www.amazon.com/s/browse?_encoding=UTF8&node=283155&ref_=nav_shopall-export_nav_mw_sbd_intl_books"

tables <- getURL(url)
tables <- readHTMLTable(tables, stringsAsFactors = F)

#Shows you all the tables pulled
str(tables)

#To view a particular table
View(tables$results)

有帮助吗?