给定一个包含6个变量的数据框:
x1 var1 x2 var2 x3 var3
如何计算变量中的缺失值:var1
,var2
,var3
BY ROW ,以便数据框具有以下变量:< / p>
x1 var1 x2 var2 x3 var3 num.missing
答案 0 :(得分:0)
具有预期答案的可重复数据集将非常有用。我会为你创建一个;
set.seed(1337)
dat <- data.frame(x1=1:10, var1=runif(10),
x2=11:20, var2=runif(10),
x3=21:30, var3=runif(10))
dat
x1 var1 x2 var2 x3 var3
1 1 0.57632155 11 0.97943029 21 0.84916377
2 2 0.56474213 12 0.99371759 22 0.72408821
3 3 0.07399023 13 0.82735873 23 0.04661798
4 4 0.45386562 14 0.19398230 24 0.15367816
5 5 0.37327926 15 0.98132543 25 0.56259417
6 6 0.33131745 16 0.02522857 26 0.98142569
7 7 0.94763002 17 0.97238848 27 0.93177423
8 8 0.28111731 18 0.92379666 28 0.89861494
9 9 0.24540405 19 0.33913968 29 0.46979326
10 10 0.14604362 20 0.24657940 30 0.99500811
删除随机的值样本;
dat[sample(1:10, 3), "var1"] <- NA
dat[sample(1:10, 3), "var2"] <- NA
dat[sample(1:10, 3), "var3"] <- NA
dat
x1 var1 x2 var2 x3 var3
1 1 NA 11 0.9794303 21 0.8491638
2 2 0.56474213 12 0.9937176 22 0.7240882
3 3 0.07399023 13 NA 23 NA
4 4 0.45386562 14 0.1939823 24 0.1536782
5 5 0.37327926 15 0.9813254 25 0.5625942
6 6 NA 16 NA 26 0.9814257
7 7 0.94763002 17 0.9723885 27 NA
8 8 0.28111731 18 NA 28 0.8986149
9 9 NA 19 0.3391397 29 0.4697933
10 10 0.14604362 20 0.2465794 30 NA
鉴于逻辑等于二进制整数(TRUE==1
,FALSE==0
),我们可以总结is.na()
次测试
dat$num.missing <- is.na(dat$var1) + is.na(dat$var2) + is.na(dat$var3)
dat
x1 var1 x2 var2 x3 var3 num.missing
1 1 NA 11 0.9794303 21 0.8491638 1
2 2 0.56474213 12 0.9937176 22 0.7240882 0
3 3 0.07399023 13 NA 23 NA 2
4 4 0.45386562 14 0.1939823 24 0.1536782 0
5 5 0.37327926 15 0.9813254 25 0.5625942 0
6 6 NA 16 NA 26 0.9814257 2
7 7 0.94763002 17 0.9723885 27 NA 1
8 8 0.28111731 18 NA 28 0.8986149 1
9 9 NA 19 0.3391397 29 0.4697933 1
10 10 0.14604362 20 0.2465794 30 NA 1