我有一个包含数字和因子变量组合的数据框。
我试图用NA递归替换所有异常值(3 x SD)但是我遇到以下错误的问题
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
我使用的代码是
name = factor(c("A","B","NA","D","E","NA","G","H","H"))
height = c(120,NA,150,170,NA,146,132,210,NA)
age = c(10,20,0,30,40,50,60,NA,130)
mark = c(100,0.5,100,50,90,100,NA,50,210)
data = data.frame(name=name,mark=mark,age=age,height=height)
data
data[is.na(data)] <- 77777
data.scale <- scale(data)
data.scale[ abs(data.scale) > 3 ] <- NA
data <- data.scale
有关如何使其正常工作的任何建议?
答案 0 :(得分:1)
这是一种方法:
library(dplyr)
# take note of order for column names
data.names <- colnames(data)
# scale all numeric columns
data.numeric <- select_if(data, is.numeric) %>% # subset of numeric columns
mutate_all(scale) # perform scale separately for each column
data.numeric[data.numeric > 3] <- NA # set values larger than 3 to NA (none in this example)
# combine results with subset data frame of non-numeric columns
data <- data.frame(select_if(data, function(x) !is.numeric(x)),
data.numeric)
# restore columns to original order
data <- data[, data.names]
> data
name mark age height
1 A 0.20461856 -0.80009469 -1.0844636
2 B -1.43232992 -0.55391171 NA
3 NA 0.20461856 -1.04627767 -0.1459855
4 D -0.61796862 -0.30772873 0.4796666
5 E 0.04010112 -0.06154575 NA
6 NA 0.20461856 0.18463724 -0.2711159
7 G NA 0.43082022 -0.7090723
8 H -0.61796862 NA 1.7309707
9 H 2.01431035 2.15410109 NA
注意:非数字(字符/因子/等)变量将在此方法中的数字变量之前排序。因此,最后一步恢复原始订单(如果适用)。