Question

我正在使用具有句点（.）的数据集代替NA s。现在，我正在查看的列是具有级别1，2和.的因素。我试图采取一种手段，显然，na.rm不起作用。我回去并通过将句点更改为NAs（pe94[pe94 == "."] <- NA）来清理数据，这似乎有效。但是，mean无法取一个因子的均值，当我将因子转换为数字时，NA成为3 s。我怎样摆脱这个问题？

Answer 1

我也有类似的问题（和其他问题）将因子转换成数字用于数学分析。但是，我发现了一个相当简单的解决方案似乎有效。希望这会有所帮助...

#Script to convert factor data to numeric data without loss or alterations of values

#Samlpe data frame with factor variables represented by numbers 
factor.vector1<-factor(x=c(111,222,333,444,555))
thousands<-c("1,000","2,000","3,000","4,000","5,000")
factor.vector2<-factor(x=thousands)
df<-data.frame(factor.vector1, factor.vector2)

#Numbers as factors without comma place holders
#1st convert dataset to character data type
df[,1]<-as.character(df[,1])
#2nd convert dataset to numeric data type
df[,1]<-as.numeric(df[,1])

#Numbers as factors WITH comma place holders 
#If data contains commas in the numbers (e.g. 2,000) use gsub to remove commas
#If commas are not removed before conversion, the value containing commas will become NA
df[,2]<-gsub(",", "", df[,2])
#1st convert dataset to character data type
df[,2]<-as.character(df[,2])
#2nd convert dataset to numeric data type
df[,2]<-as.numeric(df[,2])

转换因子包括＆＃34;。＆＃34;数字

1 个答案: