需要一点帮助将htmltab转换为小标题

时间:2018-09-05 20:13:40

标签: html-table tidyr tibble

试图通过数据帮助迈阿密海豚队的足球比赛来帮助朋友

library(htmltab)
library(tidyr)
library(tibble)

url <- "http://www.espn.com/nfl/team/schedule/_/name/mia"
data <- htmltab(doc = url, which = 1, header = 2)

unique(data)

as_tibble(data)

该表提取相同的标题(变量)。我想念一些东西。在将htmltab转换为小标题时需要一些帮助。谢谢。

What the table should look like

1 个答案:

答案 0 :(得分:0)

因此,我正在使用“ rvest”包从网站获取数据。我认为主要问题是该网站没有提供可以直接使用的清晰表格格式。您必须清理它以获得所需的输出。

rm(list=ls())
library(tidyverse)
library(rvest)

##### get data from web #####
url = "http://www.espn.com/nfl/team/schedule/_/name/mia"
tb <- url %>%
  read_html() %>%
  html_table() # this function is actually going to read all tables at this url
rawdata = tb[[1]] # tb is a list and here we only want the fist table

#### clean up the data #####
names(rawdata) = rawdata[2,] # using the second row as data names
tmp = data[grepl("from",data$TICKETS),] # select rows that contain "from"
tmp2 = tmp[,!duplicated(names(tmp))] # delete columns that have duplicated column names
res = as_tibble(tmp2) # convert to tibble

对于清洁部分,我通过观察数据来逐步进行。当然,有很多方法可以执行相同的任务。