Question

我有一个30行和1000列的数据框。这些数据的某些列具有“nan”和“inf”值。我想用零替换它们。我尝试了一些代码，但无法获得足够的结果。为了即时我创建一个像这样的样本数据框

test<-data.frame(a=c("inf",1,"inf"),b=c("nan",3,"nan"))

我在这方面尝试了很多代码，比如

>na_code <- c("nan", "inf")
for (i in seq_along(test)) {
+     test[[i]][test[[i]] %in% na_code] <- 0
+ }

我得到了这些改变警告信息： 1：在[<-.factor（*tmp*，thisvar，value = 0）中：无效因子水平，NA生成所以我试试这个

for (i in seq_along(test)) {
+     test[[i]][test[[i]] %in% na_codes] <-NaN
+ }

当我想用零替换NAN时

test[is.na(test)]<-0

我得到同样的警告。我哪里做错了？感谢。

Answer 1

这是一种不同的无循环方式。首先，我们通过as.matrix将数据强制转换为字符矩阵。然后，我们sub输出值并通过type.convert转换为数字。

type.convert(sub("inf|nan", 0, as.matrix(test)))
#      a b
# [1,] 0 0
# [2,] 1 3
# [3,] 0 0

如果需要，您可以强制回到数据框，但请注意，使用100％数字数据时，最好使用矩阵。

Answer 2

apply(test, 2, function(x){ ifelse(x %in% na_codes, 0, x) } )

这将返回：

     a   b  
[1,] "0" "0"
[2,] "1" "3"
[3,] "0" "0"

遗嘱都是人物。您可以稍后更改回因子，但我认为您希望它们是数字，在这种情况下，您只需要包装as.numeric

apply(test, 2, function(x){ as.numeric(ifelse(x %in% na_codes, 0, x)) } )

     a b
[1,] 0 0
[2,] 1 3
[3,] 0 0

Answer 3

因为列是因子，所以您只需更改级别：

as.data.frame(lapply(test, function(x) {
  levels(x)[levels(x) %in% na_code] <- 0 
  x
  })
)
# a b
# 1 0 0
# 2 1 3
# 3 0 0