Question

我正试图在NA上为一个字段进行子集化。我已经在网上阅读了其他一些解决方案，以便对NA进行分类，但是没有成功。以下是我尝试分配的字段的摘要。你可以看到有297,895个NA。

summary(mc_masterc[,7])
     12      24      36      48      60      72      84      96     108     
  3220459 1276362  338254  190636  114982   73042   48081   32001   20310   
  120     132      144    <NA> 
13565    7655     3700  297895

当我使用is.na进行子集化时，我得到一个带有0个观察值的数据帧。我已经尝试将此字段定义为数字和因子。我得到了相同的结果。我让这个工作的唯一方法是给NA一个数值。我猜这里错过了一些简单的东西，但我还没弄明白。

  df <-  mc_masterc[is.na(mc_masterc[,7]),]
  df

<0 rows> (or 0-length row.names)

更新：感谢您的反馈。以下是一些其他信息：

str(mc_masterc[,7])
 Factor w/ 13 levels "12","24","36",..: 1 1 1 1 1 1 1 1 1 1 ...

我也尝试用mc_master [，7]作为数字并得到相同的结果。我也尝试了以下三个并继续为零行。

df <-  mc_masterc[mc_masterc[,7]=="NA",]
df <-  mc_masterc[mc_masterc[,7]=="<NA>",]
df <-  mc_masterc[mc_masterc[,7]=="",]

levels(mc_masterc[,7])
[1] "12"  "24"  "36"  "48"  "60"  "72"  "84"  "96"  "108" "120" "132" "144" NA   

class(mc_masterc) [1] "data.frame"

Answer 1

The reason is that there is no NA in your data.frame. Actually there is a factor (level) of the name NA. A NA (missing) values can be converted in to an extra level using function addNA. Once NA is converted in an extra level then it is no longer a NA. Hence is.na() will not work. In fact no comparison can be performed on that value.

The help in RStudio suggessts:

addNA modifies a factor by turning NA into an extra level (so that NA values are counted in tables, for instance).

How to subset in such cases? The simple way is to 1st convert to as.character and then check is.na.

Hence, solution could be:

mc_masterc[is.na(as.character(mc_masterc[,7])),]

Earlier answer before updates from OP:

One can try something like:

mc_masterc[(mc_masterc[,7]) == "NA",]

An example from a test data which resembles OP's data set:

> x[x[,3]=="NA",]
#   a  b  c
#1  3 10 NA
#3 11  4 NA
#4 18  8 NA

> summary(x[,3])
# 3  4 NA 
# 1  1  3 

x
#   a  b  c
#1  3 10 NA
#2 12  1  3
#3 11  4 NA
#4 18  8 NA
#5 14  6  4

str(x)
#'data.frame':  5 obs. of  3 variables:
# $ a: int  3 12 11 18 14
# $ b: int  10 1 4 8 6
# $ c: Factor w/ 3 levels "3","4","NA": 3 1 3 3 2

Answer 2

Here's what I suspect is happening. My guess is that the number 7 column is really a factor variable one of whose levels is "".

test <- data.frame(one = c(2,4,6,'<NA>'), two=letters[1:4])
> test[is.na(test$one) ,]
[1] one two
<0 rows> (or 0-length row.names)

> test[test$one == "NA" ,]
[1] one two
<0 rows> (or 0-length row.names)
> test[test$one == "<NA>" ,]
   one two
4 <NA>   d

 table(test$one)

<NA>    2    4    6 
   1    1    1    1

NA上的子集

2 个答案: