我是R的新手并且一直在做一些网络抓取。我编写了以下代码,将https://uk.burberry.com/中特定项目的ID,名称,颜色和价格放入数据框中。
while read line; do
for token in $line; do
case $token in
hello) echo hello,1;;
world) echo world,1;;
esac
done
done
有没有办法创建循环,所以我可以将此代码用于网站上的每个项目并将结果放在数据框中?感谢
答案 0 :(得分:0)
您可以创建一个接受输入网址的函数,并返回一个数据框,其中包含从网页收集的信息:
get_page_data <- function(url) {
# Read HTML code from the website
webpage <- read_html(url)
# using css selectors to scrape the ID section
id_data_html <- html_nodes(webpage, '.section')
#converting the ID to text
id_data <- html_text(id_data_html)
# Remove irrelevant text
id_data <- gsub("Item", "", id_data)
# using css selectors to scrape the names section
names_data_html <- html_nodes(webpage, '.type-h6')
#converting the names to text
names_data <- html_text(names_data_html)
# Stripping irrelevant text
names_data <- gsub("\n\t\t\t\t\t\t\t", "", names_data)
# using css selectors to scrape the price section
price_data_html <- html_nodes(webpage, '.l2')
#converting the price to text
price_data <- html_text(price_data_html)
# Remove irrelevant text
price_data <- gsub("\t", "", price_data)
price_data <- gsub("\n", "", price_data)
# using css selectors to scrape the colour section
colour_data_html <- html_nodes(webpage, '#colour-picker-value')
#converting the colour to text
colour_data <- html_text(colour_data_html)
# creating the dataframe
burberry_df <- data.frame(ID = id_data, Name = names_data, Price = price_data,
Colour = colour_data)
return(burberry_df)
}
然后使用该函数只需在传递感兴趣的URL时调用它:
url <- 'https://uk.burberry.com/fringed-wool-cashmere-patchwork-cardigan-coat-p40612561'
result <- get_page_data(url)