我需要读取数据框架中的BYU学费数据 http://yfacts.byu.edu/Article?id=85使用readHTMLTable函数。我还需要清理数据并命名三个变量" year"," lds"和" nonlds"。
我有以下代码:
library("XML")
download.file("http://yfacts.byu.edu/Article?id=85",
destfile = "tuitiondata.html")
BYUtuition <- readHTMLTable("tuitiondata.html",
header=T, skip.rows=4,
colClasses=c("character","FormattedNumber","FormattedNumber"))
names(BYUtuition)<-c("year","lds","nonlds")
我得到了以下结果:
BYUtuition
$`NULL`
V1
1 Tuition History
2 For Full-time Undergraduate Students
3 1960-61
4 ...
58 2015-16
59
60 * A significant portion of the cost of operating the university is paid from the tithes of The Church of Jesus Christ of Latter-day Saints. Therefore, students and families of students who are tithe-paying members of the Church have already made a contribution to the operation of the university. Because others will not have made this contribution, they are charged a higher tuition, a practice similar in principle to that of state universities charging higher tuition to nonresidents.
V2 V3
1 NA NA
2 NA NA
3 NA NA
4 NA NA
...
60 NA NA
> mormons<-mormons[[1]]
Error: object 'mormons' not found
> names(BYUtuition)<-c("year","lds","nonlds")
Error in names(BYUtuition) <- c("year", "lds", "nonlds") :
'names' attribute [3] must be the same length as the vector [1]
有人可以帮我弄清楚我做错了什么以及我需要做什么吗?
由于
答案 0 :(得分:2)
您的BYUtuition是一个列表。使用[[1]]提取其中的data.frame。然后你可以执行格式化而不是使用FormattedNumber。
BYUtuition <- readHTMLTable("tuitiondata.html",header=T,skip.rows=4)[[1]]
#remove rows with any NA
BYUtuition <- na.omit(BYUtuition)
#set names
names(BYUtuition) <- c("year","lds","nonlds")
#convert course fee into numeric
BYUtuition$lds <- as.numeric(gsub("[^0-9a-zA-Z]+", "",BYUtuition$lds))
BYUtuition$nonlds <- as.numeric(gsub("[^0-9a-zA-Z]+", "",BYUtuition$nonlds))
#show final table
BYUtuition