Question

我想确定R数据中的缺失值

为了获取ID，请设置数据集中各列的“ ID”

我使用apply(is.na(dt_tb) 2, which)，此票据告诉我该职位，我想用ID号（id列）替换该职位

dt_tb <- data.table(id = c(5, 6, 7, 15),
                 coll = c("this", NA,"NA", "text"),
                 cyy = c(TRUE, FALSE, TRUE, TRUE),
                 hhh = c(2.5, 4.2, 3.2, NA),
                 stringsAsFactors = FALSE)

apply(is.na(dt_tb), 2, which)

示例 $ id 整数（0）

$ coll [1] 2

$ cyy 整数（0）

$ hhh [1] 4

我想要

id 整数（0）

coll 6 7

cyy 整数（0）

hhh 15

Answer 1

您可以使用unlist从id获取dt_tb$id，然后使用relist返回原始结构。

i <- apply(is.na(dt_tb) | dt_tb=="NA", 2, which)
relist(dt_tb$id[unlist(i)], i)
#$id
#numeric(0)
#
#$coll
#[1] 6 7
#
#$cyy
#numeric(0)
#
#$hhh
#[1] 15

Answer 2

您可以将which与arr.ind = TRUE一起使用，以获取存在NA或"NA"的行和列的索引。然后，您可以使用split获取命名列表。

mat <- which(is.na(dt_tb) | dt_tb == 'NA', arr.ind = TRUE)
split(dt_tb$id[mat[, 1]], names(dt_tb)[mat[, 2]])

#$coll
#[1] 6 7

#$hhh
#[1] 15

Answer 3

您可以使用complete.cases(dt_tb)

install.packages("devtools")
install.packages("data.table")
library(devtools)
library(data.table)

dt_tb <- data.table(id = c(5, 6, 7, 15),
                    coll = c("this", NA,"NA", "text"),
                    cyy = c(TRUE, FALSE, TRUE, TRUE),
                    hhh = c(2.5, 4.2, 3.2, NA),
                    stringsAsFactors = FALSE)


complete.cases(dt_tb) # returns: TRUE FALSE  TRUE FALSE

which(!complete.cases(dt_tb)) # return row numbers: 2 4

dt_tb[!complete.cases(dt_tb),] # returns: rows with missing data/na's

更新：

dt_tb[which(!complete.cases(dt_tb)),1] #to return ID's

id
1:  6
2: 15

识别R数据表中的缺失值

3 个答案: