我有一个包含NaN
个因子列的数据框,我想将其转换为NA
(NaN
似乎是使用线性回归的问题对象来预测新数据)。
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
> tester1[is.nan(tester1)] = NA
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
答案 0 :(得分:16)
问题在于:你的向量是模式中的角色,所以当然它不是“数字”。最后一个元素被解释为字符串“NaN”。使用is.nan
只有在向量是数字时才有意义。如果你想在一个字符向量中缺少一个值(以便通过回归函数得到正确处理),那么使用(不带任何引号),NA_character_
。
> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
> tester1
[1] "2" "2" "3" "4" "2" "3" NA
> is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
字符向量中既没有“NA”也没有“NaN”。如果由于某种原因,因子变量中的值为“NaN”,那么您就可以使用逻辑索引:
tester1[tester1 == "NaN"] = "NA"
# but that would not really be a missing value either
# and it might screw up a factor variable anyway.
tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))
> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2 2 3 4 2 3 <NA>
Levels: 2 3 4 NaN
最后的结果可能会令人惊讶。剩余的“NaN”水平但没有一个元素是“NaN”。相反,“NaN”元素现在是一个真正的缺失值,在print中表示为。
答案 1 :(得分:6)
修改强>
Gavin Simpson在评论中提醒我,在您的情况下,有更简单的方法将真正的“NaN”转换为“NA”:
tester1 <- gsub("NaN", "NA", tester1)
tester1
# [1] "2" "2" "3" "4" "2" "3" "NA"
<强>解决方案:强>
要检测字符向量的哪些元素为NaN
,您需要将向量转换为数字向量:
tester1[is.nan(as.numeric(tester1))] <- "NA"
tester1
[1] "2" "2" "3" "4" "2" "3" "NA"
<强>解释强>
有几个原因导致这种情况无法正常发挥作用。
首先,虽然NaN
代表“非数字”,但它确实有类"numeric"
,并且只在数字向量中有意义。
其次,当它包含在字符向量中时,符号NaN
将静默转换为字符串"NaN"
。然后,当您对nan
- ness进行测试时,字符串将返回FALSE
:
class(NaN)
# [1] "numeric"
c("1", NaN)
# [1] "1" "NaN"
is.nan(c("1", NaN))
# [1] FALSE FALSE
答案 2 :(得分:6)
你不能在角色向量中拥有NaN
,这就是你所拥有的:
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> is.nan(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> tester1
[1] "2" "2" "3" "4" "2" "3" "NaN"
请注意R如何认为这是一个字符串。
您可以在数字向量中创建NaN
:
> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> as.numeric(tester1)
[1] 2 2 3 4 2 3 NaN
> is.nan(as.numeric(tester1))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
然后,当然,R可以根据您的代码将NaN
转换为NA
:
> foo <- as.numeric(tester1)
> foo[is.nan(foo)] <- NA
> foo
[1] 2 2 3 4 2 3 NA