这是我试图从中抓取的页面http://www.footballlocks.com/nfl_point_spreads_week_1.shtml,我希望最终得到一个包含4列的简单data.frame,以便我可以进行进一步的分析。我已经尝试过使用XML包但运气不好。谢谢你的帮助
week.1 <- readHTMLTable("http://www.footballlocks.com/nfl_point_spreads_week_1.shtml")
str(week.1)
答案 0 :(得分:3)
rvest
可以做到这一点。您可以使用XPath来查找所有4列表:
library(rvest)
url <- "http://www.footballlocks.com/nfl_point_spreads_week_1.shtml"
pg <- html(url)
tabs <- pg %>% html_nodes(xpath="//table[@cols='4']")
html_table(tabs[[1]], header=TRUE)
## Date & Time Favorite Spread Underdog
## 1 9/4 8:35 ET At Seattle -5.0 Green Bay
## 2 9/7 1:00 ET New Orleans -3.0 At Atlanta
## 3 9/7 1:00 ET At St. Louis -3.0 Minnesota
## 4 9/7 1:00 ET At Pittsburgh -6.0 Cleveland
## 5 9/7 1:00 ET At Philadelphia -10.0 Jacksonville
## 6 9/7 1:00 ET At NY Jets -6.5 Oakland
## 7 9/7 1:00 ET At Baltimore -1.0 Cincinnati
## 8 9/7 1:00 ET At Chicago -7.0 Buffalo
## 9 9/7 1:00 ET At Houston -3.0 Washington
## 10 9/7 1:00 ET At Kansas City -3.0 Tennessee
## 11 9/7 1:00 ET New England -4.0 At Miami
## 12 9/7 4:25 ET At Tampa Bay -4.5 Carolina
## 13 9/7 4:25 ET San Francisco -3.5 At Dallas
## 14 9/7 8:30 ET At Denver -8.5 Indianapolis
如果需要像老学校那样踢它:
library(XML)
url <- "http://www.footballlocks.com/nfl_point_spreads_week_1.shtml"
doc <- htmlParse(url)
readHTMLTable(doc["//table[@cols='4']"][[1]])
(相同的输出)
答案 1 :(得分:0)
Pinnacle Sports有一个API可以使用。也许更适合您的目的,而不是从该网页上刮掉一周的赔率;它是足球线分析的常用来源。