试图通过数据帮助迈阿密海豚队的足球比赛来帮助朋友
library(htmltab)
library(tidyr)
library(tibble)
url <- "http://www.espn.com/nfl/team/schedule/_/name/mia"
data <- htmltab(doc = url, which = 1, header = 2)
unique(data)
as_tibble(data)
该表提取相同的标题(变量)。我想念一些东西。在将htmltab转换为小标题时需要一些帮助。谢谢。
答案 0 :(得分:0)
因此,我正在使用“ rvest”包从网站获取数据。我认为主要问题是该网站没有提供可以直接使用的清晰表格格式。您必须清理它以获得所需的输出。
rm(list=ls())
library(tidyverse)
library(rvest)
##### get data from web #####
url = "http://www.espn.com/nfl/team/schedule/_/name/mia"
tb <- url %>%
read_html() %>%
html_table() # this function is actually going to read all tables at this url
rawdata = tb[[1]] # tb is a list and here we only want the fist table
#### clean up the data #####
names(rawdata) = rawdata[2,] # using the second row as data names
tmp = data[grepl("from",data$TICKETS),] # select rows that contain "from"
tmp2 = tmp[,!duplicated(names(tmp))] # delete columns that have duplicated column names
res = as_tibble(tmp2) # convert to tibble
对于清洁部分,我通过观察数据来逐步进行。当然,有很多方法可以执行相同的任务。