我已成功使用XML包来抓取多个网站,但我在此特定网页上创建数据框时遇到了问题:
library(XML)
url <- paste("http://www.foxsports.com/nfl/injuries?season=2013&seasonType=1&week=1", sep = "")
df1 <- readHTMLTable(url)
print(df1)
> print(df1)
$`NULL`
NULL
$`NULL`
NULL
$`NULL`
Player Pos Injury Game Status
1 Dickson, Ed TE thigh Probable
2 Jensen, Ryan C foot Doubtful
3 Jones, Arthur DE illness Out
4 McPhee, Pernell LB knee Probable
5 Pitta, Dennis TE dislocated hip Injured Reserve (DFR)
6 Thompson, Deonte WR foot Doubtful
7 Williams, Brandon DT toe Doubtful
$`NULL`
Player Pos Injury Game Status
1 Anderson, C.J. RB knee Out
2 Ayers, Robert DE Achilles Probable
3 Bailey, Champ CB foot Out
4 Clady, Ryan T shoulder Probable
5 Dreessen, Joel TE knee Out
6 Kuper, Chris G ankle Doubtful
7 Osweiler, Brock QB left shoulder Probable
8 Welker, Wes WR ankle Probable
$`NULL`
etc
如果我试图强迫它,我会收到此错误:
> df1 <- data.frame(readHTMLTable(url))
Error in data.frame(`NULL` = NULL, `NULL` = NULL, `NULL` = list(Player = 1:7, :
arguments imply differing number of rows: 0, 7, 8, 6, 9, 1, 11, 4, 12, 5, 21, 3, 2, 15
我喜欢所有球队的所有伤病数据(球员,POS,伤害,比赛状态)。
提前致谢。
答案 0 :(得分:2)
你只需要删除带有1列列表的空元素和表格&#34;没有报告伤害&#34;然后使用do.call进行rbind
n<-sapply(df1, function(x) !is.null(x) && ncol(x)==4)
x <- do.call("rbind", df1[n])
rownames(x)<-NULL
答案 1 :(得分:1)
# Packages
require(XML)
require(RCurl)
# URL of interest
url <- paste("http://www.foxsports.com/nfl/injuries?season=2013&seasonType=1&week=1", sep = "")
# Parse HTML
doc <- htmlParse(url)
# Tables which are not nulls
df1 <- readHTMLTable(doc)
df.list <- df1[!as.vector(sapply(df1, is.null))]
# Get table names
table.names <- xpathSApply(doc, "//div[@class='wisfb_injuryHeader']", function(x) gsub("^\\s+|\\s+$", "", xmlValue(x)))
# Assign names
names(df.list) <- table.names
# $`San Diego Chargers`
# Player Pos Injury Game Status
# 1 Floyd, Malcom WR knee Probable
# 2 Ingram, Melvin LB Torn left ACL Day-to-Day
# 3 Liuget, Corey DE shoulder Probable
# 4 Patrick, Johnny CB concussion, not injury related Probable
# 5 Royal, Eddie WR chest, concussion Probable
# 6 Taylor, Brandon S knee Probable
# 7 Te'o, Manti LB foot Out
# 8 Wright, Shareece CB chest Probable
# #[etc.]
编辑:刚看到@Spacedman在@Chris S的答案中给出的答案基本相同。