Question

我有一个大数据集需要清理。当我执行汇总统计信息时，一些数据丢失了，因此我想删除我感兴趣的特定变量列中具有NA的各种观察结果。

df<-read.csv("ID, gender, education, IQ, testscore
1, 0, 7, 102, 18
2, NA, 9, NA, 32
3, NA, 8, 78, 33
4, NA, NA, 90, 10
5, 0, 4, 90, 12
6, 0, 4, 99, NA")

比方说，我只想删除性别，智商和测试分数列中的NA，因为我只对它们感兴趣以进行分析，因此教育中没有NA无关紧要。

我正确过滤的数据应类似于： newdf

ID, gender, education, IQ, testscore
1, 0, 7, 102, 18
3, NA, 8, 78, 33
5, 0, 4, 90, 12

Answer 1

对于tidyr包来说，这是微不足道的

library(tidyr)
newdf <- df %>% drop_na(education, IQ, testscore)
newdf
#  ID gender education  IQ testscore
# 1  1      0         7 102        18
# 3  3     NA         8  78        33
# 5  5      0         4  90        12

如何删除出现不同NA的观察行

1 个答案: