如何从网站获取表格(scrappin)

时间:2020-03-18 09:40:08

标签: r xml rvest xml2

请让我将此网站中的表格放入Rstudio中: “ https://www.worldometers.info/coronavirus/#countries

在一个月内从零开始学习R的过程中,这就是我所做的:

library(XML)     
library(rvest)
library(xml2)

url<-("https://www.worldometers.info/coronavirus/#countries")

covid<-readHTMLTable(url,which=1)

head(covid)

输出错误消息

url<-("https://www.worldometers.info/coronavirus/#countries")
> covid<-readHTMLTable(url,which=1)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: '' 

请帮助我

1 个答案:

答案 0 :(得分:1)

我们可以使用rvest来获取数据。

library(rvest)
url <- "https://www.worldometers.info/coronavirus/#countries"

url %>% 
  read_html() %>%
  html_table() %>%
  .[[1]] %>%
  replace(., . == '', NA)


#  Country,Other TotalCases NewCases TotalDeaths NewDeaths TotalRecovered ActiveCases Serious,Critical Tot Cases/1M pop
#1         China     80,894      +13       3,237        11         69,614       8,043            2,622               56
#2         Italy     31,506     <NA>       2,503        NA          2,941      26,062            2,060              521
#3          Iran     16,169     <NA>         988        NA          5,389       9,792             <NA>              193
#4         Spain     11,826     <NA>         533        NA          1,028      10,265              563              253
#5       Germany      9,414      +47          26        NA             71       9,317                2              112
#6      S. Korea      8,413      +93          84         3          1,540       6,789               59              164
#...

您可以查看readr::parse_number,将TotalCasesNewCases之类的列转换为数字格式。