使用readHTMLTable()转换刮取的R数据

时间:2013-11-28 22:29:48

标签: r

我正试图抓住这个网站http://www.hockeyfights.com/fightlog/,但很难把它放到一个很好的数据框架中。到目前为止,我有这个:

> asdf <- htmlParse("http://www.hockeyfights.com/fightlog/1")
> asdf.asdf <- readHTMLTable(asdf)

然后我得到了这个巨大的名单。如何将其转换为仅包含n行(战斗数)的玩家名称(谁在战斗中)的2列数据框?

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

这是你想要的输出吗?

require(RCurl); require(XML)
asdf <- htmlParse("http://www.hockeyfights.com/fightlog/1")
asdf.asdf <- readHTMLTable(asdf)

首先,制作每个玩家的桌子以及他们所参加的战斗计数......

# get variable with player names
one <- as.character(na.omit(asdf.asdf[[1]]$V3))
# get counts of how many times each name appears
two <- data.frame(table(one))
# remove non-name data
three <- two[two$one != 'Away / Home Player',]
# check
head(three)
 one Freq
1  Aaron Volpatti    1
3  Brandon Bollig    1
4     Brian Boyle    1
5 Brian McGrattan    1
6      Chris Neil    2
7  Colin Greening    1

第二,列出每场战斗中谁是谁......

# make data frame of pairs by subsetting the vector of names
four <- data.frame(away = one[seq(2, length(one), 3)],
                   home = one[seq(3, length(one), 3)])
# check
head(four)
            away            home
1    Brian Boyle     Zdeno Chara
2    Tom Sestito      Chris Neil
3     Dale Weise Mark Borowiecki
4 Brandon Bollig Brian McGrattan
5 Scott Hartnell     Eric Brewer
6 Colin Greening  Aaron Volpatti