我正在使用具有句点(.
)的数据集代替NA
s。现在,我正在查看的列是具有级别1
,2
和.
的因素。我试图采取一种手段,显然,na.rm不起作用。我回去并通过将句点更改为NAs(pe94[pe94 == "."] <- NA
)来清理数据,这似乎有效。但是,mean
无法取一个因子的均值,当我将因子转换为数字时,NA
成为3
s。我怎样摆脱这个问题?
答案 0 :(得分:1)
我也有类似的问题(和其他问题)将因子转换成数字用于数学分析。但是,我发现了一个相当简单的解决方案似乎有效。希望这会有所帮助...
#Script to convert factor data to numeric data without loss or alterations of values
#Samlpe data frame with factor variables represented by numbers
factor.vector1<-factor(x=c(111,222,333,444,555))
thousands<-c("1,000","2,000","3,000","4,000","5,000")
factor.vector2<-factor(x=thousands)
df<-data.frame(factor.vector1, factor.vector2)
#Numbers as factors without comma place holders
#1st convert dataset to character data type
df[,1]<-as.character(df[,1])
#2nd convert dataset to numeric data type
df[,1]<-as.numeric(df[,1])
#Numbers as factors WITH comma place holders
#If data contains commas in the numbers (e.g. 2,000) use gsub to remove commas
#If commas are not removed before conversion, the value containing commas will become NA
df[,2]<-gsub(",", "", df[,2])
#1st convert dataset to character data type
df[,2]<-as.character(df[,2])
#2nd convert dataset to numeric data type
df[,2]<-as.numeric(df[,2])