我正试图通过网络抓取亚马逊的图书名称:
anaconda
通过CSS SELECTOR:
rm(list = ls())
library(rvest)
library(XML)
library(xml2)
url_amazon <- 'https://www.amazon.com/s/browse?_encoding=UTF8&node=283155&ref_=nav_shopall-export_nav_mw_sbd_intl_books'
web_page<-read_html(url_amazon)
通过XPATH选择器:
rank_titles<-html_text(html_nodes(web_page,".a-link-normal .a-size-base"))
但是书名不正确。为什么?我在做什么错了?
有帮助吗?
答案 0 :(得分:0)
这将获取所有表格。
library(XML)
library(RCurl)
url <- "https://www.amazon.com/s/browse?_encoding=UTF8&node=283155&ref_=nav_shopall-export_nav_mw_sbd_intl_books"
tables <- getURL(url)
tables <- readHTMLTable(tables, stringsAsFactors = F)
#Shows you all the tables pulled
str(tables)
#To view a particular table
View(tables$results)
有帮助吗?