转换因子包括"。"数字

时间:2014-04-08 04:41:23

标签: r

我正在使用具有句点(.)的数据集代替NA s。现在,我正在查看的列是具有级别12.的因素。我试图采取一种手段,显然,na.rm不起作用。我回去并通过将句点更改为NAs(pe94[pe94 == "."] <- NA)来清理数据,这似乎有效。但是,mean无法取一个因子的均值,当我将因子转换为数字时,NA成为3 s。我怎样摆脱这个问题?

1 个答案:

答案 0 :(得分:1)

我也有类似的问题(和其他问题)将因子转换成数字用于数学分析。但是,我发现了一个相当简单的解决方案似乎有效。希望这会有所帮助...

#Script to convert factor data to numeric data without loss or alterations of values

#Samlpe data frame with factor variables represented by numbers 
factor.vector1<-factor(x=c(111,222,333,444,555))
thousands<-c("1,000","2,000","3,000","4,000","5,000")
factor.vector2<-factor(x=thousands)
df<-data.frame(factor.vector1, factor.vector2)

#Numbers as factors without comma place holders
#1st convert dataset to character data type
df[,1]<-as.character(df[,1])
#2nd convert dataset to numeric data type
df[,1]<-as.numeric(df[,1])

#Numbers as factors WITH comma place holders 
#If data contains commas in the numbers (e.g. 2,000) use gsub to remove commas
#If commas are not removed before conversion, the value containing commas will become NA
df[,2]<-gsub(",", "", df[,2])
#1st convert dataset to character data type
df[,2]<-as.character(df[,2])
#2nd convert dataset to numeric data type
df[,2]<-as.numeric(df[,2])