I am trying to loop through a list of horsenumber, with the base url pasted after (horseno=). However, many of the time I either get back Subscript out of Bound error, or return a character(0).
library(rvest)
library(tidyverse)
horsenumber <- c("S385" "T436" "B016" "V102" "B121" "A370" "V026" "V107" "V086" "A082" "T267" "B059" "T118" "V077" "S393" "T230" "A061" "B387" "T370" "B165" "B326"
[22] "B317" "B159" "B353" "T029" "T233" "A357" "A334" "A235" "T412" "V074" "B133" "T022" "A195" "T253" "A233" "V338" "B182" "A071" "V407" "B197" "B421"
[43] "A427" "T282" "A359" "A069" "A097" "A351" "S397" "A305" "T112" "V334" "S204" "P421" "S277" "B141" "A333" "T380" "A005" "A189" "A314" "V381" "S420"
[64] "A419" "V243" "A284" "S388" "A125" "B370" "A408" "A057" "A086" "B242" "A424" "B292" "T388" "V072" "V250" "A177" "T134" "A067" "A074" "A417" "B265"
[85] "B170" "T419" "T389" "B080" "B300" "V336" "B119" "B204" "B144" "B260" "B350" "B056" "A150" "B209" "T200" "B149" "B249" "T349")
data <- lapply(paste0('http://racing.hkjc.com/racing/information/english/horse/horse.aspx?horseno=', horsenumber),
function(url){
horsename <- url %>% read_html() %>%
html_nodes(".title_text") %>%
html_text()
horsename
age <- url %>% read_html() %>%
html_nodes("td tr:nth-child(1) td:nth-child(2) span") %>%
html_text()
age
sex <- url %>% read_html() %>%
html_nodes("tr:nth-child(2) td:nth-child(2) span") %>%
html_text()
sex
rhistory <- url %>% read_html() %>%
html_nodes("tr:nth-child(6) td:nth-child(2) span.table_eng_text") %>%
html_text()
rhistory
r10day <- url %>% read_html() %>%
html_nodes("tr:nth-child(7) td:nth-child(2) span.table_eng_text") %>%
html_text()
r10day
rating <- url %>% read_html() %>%
html_nodes("tr:nth-child(3) td:nth-child(4) .table_eng_text") %>%
html_text()
rating
data <- rbind(horsename,age,sex,rhistory,r10day,rating)
rbind(data)
})
In addition to that, I tried to use the following to scrape that particular table and turn it to dataframe for data mining. However, I also received Error in .[[6]] : subscript out of bounds.
horse_info <- page %>%
html_nodes('table') %>%
.[6] %>%
html_table(fill=TRUE)
horse_info
Much appreciated