我正试图在NA上为一个字段进行子集化。我已经在网上阅读了其他一些解决方案,以便对NA进行分类,但是没有成功。以下是我尝试分配的字段的摘要。你可以看到有297,895个NA。
summary(mc_masterc[,7])
12 24 36 48 60 72 84 96 108
3220459 1276362 338254 190636 114982 73042 48081 32001 20310
120 132 144 <NA>
13565 7655 3700 297895
当我使用is.na进行子集化时,我得到一个带有0个观察值的数据帧。我已经尝试将此字段定义为数字和因子。我得到了相同的结果。我让这个工作的唯一方法是给NA一个数值。我猜这里错过了一些简单的东西,但我还没弄明白。
df <- mc_masterc[is.na(mc_masterc[,7]),]
df
<0 rows> (or 0-length row.names)
更新: 感谢您的反馈。以下是一些其他信息:
str(mc_masterc[,7])
Factor w/ 13 levels "12","24","36",..: 1 1 1 1 1 1 1 1 1 1 ...
我也尝试用mc_master [,7]作为数字并得到相同的结果。我也尝试了以下三个并继续为零行。
df <- mc_masterc[mc_masterc[,7]=="NA",]
df <- mc_masterc[mc_masterc[,7]=="<NA>",]
df <- mc_masterc[mc_masterc[,7]=="",]
levels(mc_masterc[,7])
[1] "12" "24" "36" "48" "60" "72" "84" "96" "108" "120" "132" "144" NA
class(mc_masterc) [1] "data.frame"
答案 0 :(得分:1)
The reason is that there is no NA in your data.frame. Actually there is a factor (level)
of the name NA
. A NA
(missing) values can be converted in to an extra level
using function addNA
. Once NA
is converted in an extra level then it is no longer a NA
. Hence is.na()
will not work. In fact no comparison can be performed on that value.
The help in RStudio
suggessts:
addNA modifies a factor by turning NA into an extra level (so that NA values are counted in tables, for instance).
How to subset in such cases? The simple way is to 1st convert to as.character
and then check is.na
.
Hence, solution could be:
mc_masterc[is.na(as.character(mc_masterc[,7])),]
Earlier answer before updates from OP:
One can try something like:
mc_masterc[(mc_masterc[,7]) == "NA",]
An example from a test data which resembles OP's data set:
> x[x[,3]=="NA",]
# a b c
#1 3 10 NA
#3 11 4 NA
#4 18 8 NA
> summary(x[,3])
# 3 4 NA
# 1 1 3
x
# a b c
#1 3 10 NA
#2 12 1 3
#3 11 4 NA
#4 18 8 NA
#5 14 6 4
str(x)
#'data.frame': 5 obs. of 3 variables:
# $ a: int 3 12 11 18 14
# $ b: int 10 1 4 8 6
# $ c: Factor w/ 3 levels "3","4","NA": 3 1 3 3 2
答案 1 :(得分:1)
Here's what I suspect is happening. My guess is that the number 7 column is really a factor variable one of whose levels is "".
test <- data.frame(one = c(2,4,6,'<NA>'), two=letters[1:4])
> test[is.na(test$one) ,]
[1] one two
<0 rows> (or 0-length row.names)
> test[test$one == "NA" ,]
[1] one two
<0 rows> (or 0-length row.names)
> test[test$one == "<NA>" ,]
one two
4 <NA> d
table(test$one)
<NA> 2 4 6
1 1 1 1