Question

我有一个来自我试图下载的网站的表，它似乎是由一堆表组成的。现在我使用rvest将表格作为文本引入，但它引入了一堆我不感兴趣的其他表格，然后我将数据强制转换为更好的格式，但它不是可重复的处理。这是我的代码：

library(rvest)
library(tidyr)

#Auto Download Data
#reads the url of the race
race_url <- read_html("http://racing-reference.info/race/2016_Folds_of_Honor_QuikTrip_500/W") 
#reads in the tables, in this code there are too many
race_results <- race_url %>%
        html_nodes(".col") %>%
        html_text() 
race_results <- data.table(race_results) #turns from a factor to a DT
f <- nrow(race_results) #counts the number of rows in the data
#eliminates all rows after 496 (11*45 + 1) since there are never more than 43 racers
race_results <- race_results[-c(496:f)] 
#puts the data into a format with 1:11 numbering for unstacking
testDT <- data.frame(X = race_results$race_results, ind = rep(1:11, nrow(race_results)/11)) 
testDT <- unstack(testDT, X~ind) #unstacking data into 11 columns
colnames(testDT) <- testDT[1, ] #changing the top column into the header

我评论了一切，所以你会知道我想要做什么。如果你转到URL，有一个顶级表格，其中包含驱动程序结果，这就是我想要抓取的内容，但是它也会触底，因为我似乎无法获得不同的html_nodes与“.col”以外的工作。我还尝试html_table()代替html_text()，但它没有用。我想这可以通过识别css中的表（我无法弄清楚）或使用不同类型的调用或XML库（我也无法弄清楚）来完成。任何帮助或方向表示赞赏。

更新：

从下面的评论中，提取此数据的正确代码如下：

library(rvest)
library(tidyr)

#Auto Download Data
race_url <- read_html("http://racing-reference.info/race/2016_Folds_of_Honor_QuikTrip_500/W") #reads the url of the race
race_results <- race_url %>% html_nodes("table") #returns a DF with all of the tables on the page
race_results <- race_results[7] %>% html_table()
race_results <- data.frame(race_results) #turns from a factor to a DT

在R中刮一张由桌子组成的桌子

0 个答案: