从rda文件到数字矩阵的Carc数据

时间:2014-01-18 10:47:31

标签: r matrix dataset

我尝试对carc数据进行KDA(内核判别分析),但是当我调用命令X<-data.frame(scale(X));时,r显示错误:

"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"

我尝试使用as.numeric(as.matrix(carc))carc<-na.omit(carc),但它也无济于事

library(ks);library(MASS);library(klaR);library(FSelector)
install.packages("klaR")
install.packages("FSelector")
library(ks);library(MASS);library(klaR);library(FSelector)
attach("carc.rda")
data<-load("carc.rda")
data
carc<-na.omit(carc)
head(carc)
class(carc) # check for its class 
class(as.matrix(carc)) # change class, and 
as.numeric(as.matrix(carc))
XX<-carc
X<-XX[,1:12];X.class<-XX[,13];
X<-data.frame(scale(X));
fit.pc<-princomp(X,scores=TRUE);
plot(fit.pc,type="line")
X.new<-fit.pc$scores[,1:5]; X.new<-data.frame(X.new);
cfs(X.class~.,cbind(X.new,X.class))
X.new<-fit.pc$scores[,c(1,4)]; X.new<-data.frame(X.new);
fit.kda1<-Hkda(x=X.new,x.group=X.class,pilot="samse",
bw="plugin",pre="sphere")
kda.fit1 <- kda(x=X.new, x.group=X.class, Hs=fit.kda1)

您能帮助解决此问题并进行此分析吗?

补充:汽车数据集(Chambers,kleveland,Kleiner&amp; Tukey 1983)

> head(carc)
               P  M R78 R77   H    R Tr    W   L  T   D    G      C
AMC_Concord 4099 22   3   2 2.5 27.5 11 2930 186 40 121 3.58     US
AMC_Pacer   4749 17   3   1 3.0 25.5 11 3350 173 40 258 2.53     US
AMC_Spirit  3799 22   .   . 3.0 18.5 12 2640 168 35 121 3.08     US
Audi_5000   9690 17   5   2 3.0 27.0 15 2830 189 37 131 3.20 Europe
Audi_Fox    6295 23   3   3 2.5 28.0 11 2070 174 36  97 3.70 Europe

2 个答案:

答案 0 :(得分:0)

这是一个小型数据集,其特征与您描述的类似 为了回答这个错误:

"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"

carc <- data.frame(type1=rep(c('1','2'), each=5),
                   type2=rep(c('5','6'), each=5),
                   x = rnorm(10,1,2)/10, y = rnorm(10))

这应与您的data.frame

类似
str(carc)
# 'data.frame':  10 obs. of  3 variables:
# $ type1: Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
# $ type2: Factor w/ 2 levels "5","6": 1 1 1 1 1 2 2 2 2 2
#  $ x   : num  -0.1177 0.3443 0.1351 0.0443 0.4702 ...
#  $ y   : num  -0.355 0.149 -0.208 -1.202 -1.495 ...

scale(carc)
# Similar error
# Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

使用set()

require(data.table)
DT <- data.table(carc)

cols_fix <- c("type1", "type2")
for (col in cols_fix) set(DT, j=col, value = as.numeric(as.character(DT[[col]])))

str(DT)
# Classes ‘data.table’ and 'data.frame':  10 obs. of  4 variables:
#   $ type1: num  1 1 1 1 1 2 2 2 2 2
# $ type2: num  5 5 5 5 5 6 6 6 6 6
# $ x    : num  0.0465 0.1712 0.1582 0.1684 0.1183 ...
# $ y    : num  0.155 -0.977 -0.291 -0.766 -1.02 ...
# - attr(*, ".internal.selfref")=<externalptr> 

答案 1 :(得分:0)

数据集的第一列可能是因素。从corrgram获取数据:

library(corrgram) 
carc <- auto
str(carc)
# 'data.frame':  74 obs. of  14 variables:
#   $ Model : Factor w/ 74 levels "AMC Concord      ",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ Origin: Factor w/ 3 levels "A","E","J": 1 1 1 2 2 2 1 1 1 1 ...
# $ Price : int  4099 4749 3799 9690 6295 9735 4816 7827 5788 4453 ...
# $ MPG   : int  22 17 22 17 23 25 20 15 18 26 ...
# $ Rep78 : num  3 3 NA 5 3 4 3 4 3 NA ...
# $ Rep77 : num  2 1 NA 2 3 4 3 4 4 NA ...
# $ Hroom : num  2.5 3 3 3 2.5 2.5 4.5 4 4 3 ...
# $ Rseat : num  27.5 25.5 18.5 27 28 26 29 31.5 30.5 24 ...
# $ Trunk : int  11 11 12 15 11 12 16 20 21 10 ...
# $ Weight: int  2930 3350 2640 2830 2070 2650 3250 4080 3670 2230 ...
# $ Length: int  186 173 168 189 174 177 196 222 218 170 ...
# $ Turn  : int  40 40 35 37 36 34 40 43 43 34 ...
# $ Displa: int  121 258 121 131 97 121 196 350 231 304 ...
# $ Gratio: num  3.58 2.53 3.08 3.2 3.7 3.64 2.93 2.41 2.73 2.87 ...

因此,请尝试将其排除在外:

X<-XX[,3:14]

或者

X<-XX[,-(1:2)]