数据框的维数不正确

时间:2017-03-03 16:01:41

标签: r dataframe

我是R languange的新手,我有一个任务,我应该在维基百科的html表格中显示一个数据箱图:

library("rvest")
library("ggplot2")
library("dplyr")
url <- "https://en.wikipedia.org/wiki/List_of_countries_by_oil_exports"
Countries <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="mw-content-text"]/table[2]') %>% 
html_table(header=TRUE, fill=TRUE)
Countries <- Countries
head(Countries)
str(Countries)
for(i in 1:74){
     Countries[i,3] = as.numeric(Countries[i,3])
}
#ggplot(Oil_Exports) + geom_boxplot() +
#  ylab("Amount of oil Exports in (bbl/day)") +
#  opts(title = "List of countries by oil exports")

如果我正确移动,我目前正在尝试将第3列中所有行的值更改为数字(Oil - exports(bbl / day))。我收到以下错误:

List of 1
 $ :'data.frame':   74 obs. of  6 variables:
  ..$ Rank                   : int [1:74] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Country/Region         : chr [1:74] "Saudi Arabia" "Russia" "Kuwait" "Iran" ...
  ..$ Oil - exports (bbl/day): chr [1:74] "6,880,000" "4,720,000" "2,750,000" "2,445,000" ...
  ..$ Date of
information   : chr [1:74] "2011 est." "2013 est." "2016 est." "2011 est." ...
  ..$ Oil - exports (bbl/day): chr [1:74] "8,865,000" "7,201,000" "2,300,000" "1,808,000" ...
  ..$ Date of
information   : int [1:74] 2012 2012 2012 2012 2016 2014 2012 2012 2012 2012 ...
Error in Countries[i, 3]: incorrect number of dimensions
Traceback:

如何解决问题,是否有更好的方法来解决?感谢。

1 个答案:

答案 0 :(得分:2)

您的抓取脚本的输出是一个列表,而不是data.frame。我想你只想提取作为这个列表的第一个对象的data.frame。因此,只需添加Countries <- Countries[[1]]

即可
library("rvest")
library("ggplot2")
library("dplyr")
url <- "https://en.wikipedia.org/wiki/List_of_countries_by_oil_exports"
Countries <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="mw-content-text"]/table[2]') %>% 
  html_table(header=TRUE, fill=TRUE)

Countries <- Countries[[1]]

但是,由于您的变量包含分隔数千的逗号,因此这不会开箱即用。让我们删除它们:

Countries[,3] <- gsub(",", "", Countries[,3])

此外,您不需要循环来转换变量:

Countries[,3] <- as.numeric(Countries[,3])
Countries[,3]