我正在尝试使用以下变量创建数据框。但是,在使用SelectorGadget工具确定刮取此信息所需的CSS选择器之后,向量会产生不同的值。即使直接从HTML源代码复制选择器。如果正确完成,此表应该有34行。这是我的代码和相应的错误:
womens_bb <- read_html("http://gomason.com/schedule.aspx?path=wbball")
womens_opponents <- womens_bb %>%
html_nodes(".sidearm-schedule-game-opponent-name a") %>%
html_text()
womens_locations <- womens_bb %>%
html_nodes(".sidearm-schedule-game-location span:nth-child(1)") %>%
html_text()
womens_dates <- womens_bb %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(1)") %>%
html_text()
womens_times <- womens_bb %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(2)") %>%
html_text()
as.numeric()
womens_scores <- womens_bb %>%
html_nodes("div.sidearm-schedule-game-result span:nth-child(3)") %>%
html_text()
as.numeric()
womens_win_loss <- womens_bb %>%
html_nodes(".text-italic span:nth-child(2)") %>%
html_text() %>%
str_replace("\\,", "")
womens_df <- data_frame(
date = womens_dates, time = womens_times, opponent = womens_opponents, location = womens_locations, score = womens_scores, win_loss = womens_win_loss)
Error: Columns `date`, `time`, `opponent`, `score`, `win_loss` must be length 1 or 37, not 36, 36, 34, 34, 35
如何解决此问题?
答案 0 :(得分:1)
我认为img标签存在一些问题。所以为了避免这些,您可以先收集全局div标签(当我执行脚本时为36),并在内部循环以获得结果。如果对标签看起来很奇怪,那就执行一点:
womens_bb <- read_html("http://gomason.com/schedule.aspx?path=wbball")
divs <- womens_bb %>% html_nodes(".sidearm-schedule-game")
for (div in divs){
womens_opponents <- div %>%
html_nodes(".sidearm-schedule-game-opponent-name, .sidearm-schedule-game-opponent-name a") %>%
html_text
womens_opponents <- gsub("\\s{2,}","",womens_opponents[1])
womens_locations <- div %>%
html_nodes(".sidearm-schedule-game-location span:nth-child(1)") %>%
html_text()
womens_locations <- womens_locations[1]
womens_dates <- div %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(1)") %>%
html_text()
womens_times <- div %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(2)") %>%
html_text()
womens_scores <- div %>%
html_nodes("div.sidearm-schedule-game-result span:nth-child(3)") %>%
html_text()
if(length(womens_scores)==0) womens_scores = ""
womens_win_loss <- div %>%
html_nodes(".text-italic span:nth-child(2)") %>%
html_text()
womens_win_loss <- gsub("\\,", "",womens_win_loss)
res <- c(date = womens_dates, time = womens_times, opponent = womens_opponents, location = womens_locations, score = womens_scores, win_loss = womens_win_loss)
print(length(res))
df <- rbind(df,res)
}
希望这会有所帮助,
Gottavianoni