使用rvest来搜索r中的多个网页

时间:2017-10-25 11:00:21

标签: r web-scraping rvest

我是R的新手并且一直在做一些网络抓取。我编写了以下代码,将https://uk.burberry.com/中特定项目的ID,名称,颜色和价格放入数据框中。

while read line; do
  for token in $line; do
    case $token in
    hello) echo hello,1;;
    world) echo world,1;;
    esac
  done
done

有没有办法创建循环,所以我可以将此代码用于网站上的每个项目并将结果放在数据框中?感谢

1 个答案:

答案 0 :(得分:0)

您可以创建一个接受输入网址的函数,并返回一个数据框,其中包含从网页收集的信息:

get_page_data <- function(url) {
    # Read HTML code from the website
    webpage <- read_html(url)

    # using css selectors to scrape the ID section
    id_data_html <- html_nodes(webpage, '.section') 
    #converting the ID to text
    id_data <- html_text(id_data_html)
    # Remove irrelevant text
    id_data <- gsub("Item", "", id_data)

    # using css selectors to scrape the names section
    names_data_html <- html_nodes(webpage, '.type-h6') 
    #converting the names to text
    names_data <- html_text(names_data_html)
    # Stripping irrelevant text
    names_data <- gsub("\n\t\t\t\t\t\t\t", "", names_data)

    # using css selectors to scrape the price section
    price_data_html <- html_nodes(webpage, '.l2') 
    #converting the price to text
    price_data <- html_text(price_data_html)
    # Remove irrelevant text
    price_data <- gsub("\t", "", price_data)
    price_data <- gsub("\n", "", price_data)

    # using css selectors to scrape the colour section
    colour_data_html <- html_nodes(webpage, '#colour-picker-value') 
    #converting the colour to text
    colour_data <- html_text(colour_data_html)

    # creating the dataframe
    burberry_df <- data.frame(ID = id_data, Name = names_data, Price = price_data,
                              Colour = colour_data)

    return(burberry_df)
}

然后使用该函数只需在传递感兴趣的URL时调用它:

url <- 'https://uk.burberry.com/fringed-wool-cashmere-patchwork-cardigan-coat-p40612561'
result <- get_page_data(url)