Web Scraping - 使用R,bind_rows_(x,.id)中的错误

时间:2017-02-09 11:14:23

标签: html r xml web-scraping

我的输出数据应与我附加的图像类似

enter image description here

我使用了以下代码但是我收到了错误

Error in bind_rows_(x, .id) : 
  Can not automatically convert from character to integer in column "Runs"

我使用了以下示例代码

require(rvest)
require(tidyverse)

urls <- c("http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=1;template=results;type=batting;view=match",
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=2;innings_number=2;orderby=start;result=1;template=results;type=batting;view=match",
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=2;template=results;type=batting;view=match",
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=1;innings_number=2;orderby=start;result=2;template=results;type=batting;view=match",
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=2;innings_number=1;orderby=start;result=2;template=results;type=batting;view=match",
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=2;innings_number=2;orderby=start;result=2;template=results;type=batting;view=match"
)

extra_cols <- list(tibble("Team"="IND","Player"="B.Kumar","won"=1,"lost"=0,"D"=1,"D/N"=0,"innings"=1,"Format"="ODI"),
                   tibble("Team"="IND","Player"="B.Kumar","won"=1,"lost"=0,"D"=0,"D/N"=1,"innings"=2,"Format"="ODI"),
                   tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=1,"D/N"=0,"innings"=1,"Format"="ODI"),
                   tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=1,"D/N"=0,"innings"=2,"Format"="ODI"),
                   tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=0,"D/N"=1,"innings"=1,"Format"="ODI"),
                   tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=0,"D/N"=1,"innings"=2,"Format"="ODI")
)

doc <- map(urls, read_html) %>% 
  map(html_node, ".engineTable:nth-child(5)")



keep <- map_lgl(doc, ~class(.) != "xml_missing")  #### condition to exclude when web urls return "NO Records"###

table<-map(doc[keep], html_table, fill = TRUE) %>% 
  map2_df(extra_cols[keep], cbind)

1 个答案:

答案 0 :(得分:0)

问题是{ - 1}}列中有时会出现“ - ”。 所以当没有“ - ”Runs将其解释为整数列时, 如果有“ - ”则将其解释为字符。

显然,“ - ”应解释为html_table。这可以通过NA实现,如下所示:

type_convert

table<-map(doc[keep], html_table, fill = TRUE) %>% map(type_convert, na = c("", NA, "-")) %>% map2_df(extra_cols[keep], cbind) 地图“ - ”到NA。