我有data.table
。我想删除那些除了某些2列以外的所有列都是NA的行。例如:
我有一个data.table,如:
> ww2
Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
1: 5.1 3.5 1.4 0.2 setosa 1
2: 4.9 3.0 1.4 0.2 setosa 2
3: 4.7 3.2 1.3 0.2 setosa 3
4: 4.6 3.1 1.5 0.2 setosa 4
5: 5.0 3.6 1.4 0.2 setosa 5
6: 5.1 3.5 1.4 0.2 dffdsdf 1
7: 4.9 3.0 1.4 0.2 dffdsdf 2
8: 4.7 3.2 1.3 0.2 dffdsdf 3
9: NA NA NA NA dffdsdf 4
10: NA NA NA NA dffdsdf 5
它的输入是:
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.1, 4.9,
4.7, NA, NA), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.5, 3,
3.2, NA, NA), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.4,
1.4, 1.3, NA, NA), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.2,
0.2, 0.2, NA, NA), Species = structure(c(1L, 1L, 1L, 1L, 1L,
4L, 4L, 4L, 4L, 4L), class = "factor", .Label = c("setosa", "versicolor",
"virginica", "dffdsdf")), index = c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L)), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length",
"Petal.Width", "Species", "index"), row.names = c(NA, -10L), class = "data.frame")
在上面的数据表中,我想删除第9行和第10行。由于我的实际数据表非常大并且列数较多,因此很难明确提到那些NA的列。但是非NA的列是固定的(它们是2,在此特定示例中它们是index
和Species
。
我正在寻找一种有效而快速的解决方案。
答案 0 :(得分:2)
根据您提供的数据,我会做类似的事情:
library(dplyr)
na_rows = ww2 %>%
select(-Species, -index) %>%
is.na() %>%
rowSums() > 0
ww2 %>%
filter(!na_rows)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 2
3 4.7 3.2 1.3 0.2 setosa 3
4 4.6 3.1 1.5 0.2 setosa 4
5 5.0 3.6 1.4 0.2 setosa 5
6 5.1 3.5 1.4 0.2 dffdsdf 1
7 4.9 3.0 1.4 0.2 dffdsdf 2
8 4.7 3.2 1.3 0.2 dffdsdf 3
或更多默认R样式(我喜欢dplyr
):
na_rows = rowSums(is.na(ww2[, -which(names(ww2) %in% c('Species', 'index')), with = FALSE])) > 0
ww2[!na_rows,]