在R中进行Webscraping,“......在当前工作目录中不存在”错误

时间:2016-10-25 03:18:49

标签: r web-scraping xml2

我正在尝试使用XML2软件包从ESPN.com中删除一些表格。为了举个例子,我想把第7周的幻想四分卫排名变成R,其中的URL是:

http://www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-quarterback-rankings

我正在尝试使用“read_html()”函数来执行此操作,因为这是我最熟悉的。这是我的语法及其错误:

> wk.7.qb.rk = read_html("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').

我也试过“read_xml()”,只是为了得到同样的错误:

> wk.7.qb.rk = read_xml("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').

为什么R在工作目录中查找此URL?我已尝试使用其他URL的此功能,并取得了一些成功。这个特定的网址是什么让它看起来与其他网站不同?而且,我该如何改变呢?

1 个答案:

答案 0 :(得分:2)

当我在循环中运行read_html以浏览20页时出现此错误。在第20页之后,循环仍在运行,没有url,因此它开始使用NAs调用read_html进行其他循环迭代。希望这有帮助!