R不能将NaN转换为NA

时间:2012-02-27 22:07:44

标签: r nan na

我有一个包含NaN个因子列的数据框,我想将其转换为NANaN似乎是使用线性回归的问题对象来预测新数据)。

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = NA
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"

3 个答案:

答案 0 :(得分:16)

问题在于:你的向量是模式中的角色,所以当然它不是“数字”。最后一个元素被解释为字符串“NaN”。使用is.nan只有在向量是数字时才有意义。如果你想在一个字符向量中缺少一个值(以便通过回归函数得到正确处理),那么使用(不带任何引号),NA_character_

> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
>  tester1
[1] "2" "2" "3" "4" "2" "3" NA 
>  is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

字符向量中既没有“NA”也没有“NaN”。如果由于某种原因,因子变量中的值为“NaN”,那么您就可以使用逻辑索引:

tester1[tester1 == "NaN"] = "NA"  
# but that would not really be a missing value either 
# and it might screw up a factor variable anyway.

tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))

> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2    2    3    4    2    3    <NA>
Levels: 2 3 4 NaN

最后的结果可能会令人惊讶。剩余的“NaN”水平但没有一个元素是“NaN”。相反,“NaN”元素现在是一个真正的缺失值,在print中表示为。

答案 1 :(得分:6)

修改

Gavin Simpson在评论中提醒我,在您的情况下,有更简单的方法将真正的“NaN”转换为“NA”:

tester1 <- gsub("NaN", "NA", tester1)
tester1
# [1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"

<强>解决方案:

要检测字符向量的哪些元素为NaN,您需要将向量转换为数字向量:

tester1[is.nan(as.numeric(tester1))] <- "NA"
tester1
[1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"

<强>解释

有几个原因导致这种情况无法正常发挥作用。

首先,虽然NaN代表“非数字”,但它确实有类"numeric",并且只在数字向量中有意义。

其次,当它包含在字符向量中时,符号NaN将静默转换为字符串"NaN"。然后,当您对nan - ness进行测试时,字符串将返回FALSE

class(NaN)
# [1] "numeric"
c("1", NaN)
# [1] "1"   "NaN"
is.nan(c("1", NaN))
# [1] FALSE FALSE

答案 2 :(得分:6)

你不能在角色向量中拥有NaN,这就是你所拥有的:

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> is.nan(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> tester1
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"

请注意R如何认为这是一个字符串。

您可以在数字向量中创建NaN

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> as.numeric(tester1)
[1]   2   2   3   4   2   3 NaN
> is.nan(as.numeric(tester1))
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

然后,当然,R可以根据您的代码将NaN转换为NA

> foo <- as.numeric(tester1)
> foo[is.nan(foo)] <- NA
> foo
[1]  2  2  3  4  2  3 NA