我的数据集如下所示:
我想清理它,以便当“QR”显示C:
时所有行都是NA SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.5656 6.28 5.23 <NA> A
4 0.219 -0.5656 6.28 -5.66 <NA> C
5 0.219 -0.5656 6.28 5.23 <NA> C
所以我可以这样做:
mydata[mydata$QR=="C",] <- NA
但是,我想继续为其他变量这样做,例如当LabPH> 6 OR <0时,将整行设置为NA。
如果我再次做同样的事情,我会收到以下警告:
Error in `[<-.data.frame`(`*tmp*`, mydata$LabPH > 5 | mydata$LabPH < 0, : missing values are not allowed in subscripted assignments of data frames
还有另一种方法吗?那个案例有一个ignoreNA函数吗? 或者有更好的方法吗?
提前非常感谢 干杯 桑德拉
答案 0 :(得分:1)
您只需在逻辑测试中添加which
即可。
例如,
mydata[which(mydata$LabPh > 5.25),] <- NA
答案 1 :(得分:1)
data.frame
,则 NA
无法进行子集化。例如,您可以看到LabPH = NA
的行未进行子集化。
> mydata[mydata$LabPH > 5.25,]
SO4 PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126 3.54 5.27 mz B
NA NA NA NA NA <NA> <NA>
NA.1 NA NA NA NA <NA> <NA>
which
有效,因为它排除了LabPH = NA
行,另一种方法是使用!is.na()
排除NA
> new <- mydata[!is.na(mydata$LabPH)&mydata$LabPH > 5.25,]
> new
SO4 PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126 3.54 5.27 mz B
答案 2 :(得分:0)
是不是用NA
替换整行与排除数据一样好?如果是这样,考虑到你的条件(QR = "C" and LabPH = between 0 to 6)
,这是一种方法......
# Please note I added a random 6th row with LabPH = 7.0.
SO4 = c(0.131,0.109,0.219,0.219,0.219,0.21)
PO4 = c(0.00100,0.00126,-0.5656,-0.5656,-0.5656,-0.532)
LabConductivity = c(3.98, 3.54, 6.28, 6.28, 6.28,6.25)
LabPH = c(5.25,5.27,5.23,-5.66,5.23,7.0)
Notes = c("dmz","mz","<NA>","<NA>","<NA>","mz")
QR = c("B","B","A","C","C","B")
# create a data frame
df = data.frame(SO4,PO4,LabConductivity,LabPH,Notes,QR)
df
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.56560 6.28 5.23 <NA> A
4 0.219 -0.56560 6.28 -5.66 <NA> C
5 0.219 -0.56560 6.28 5.23 <NA> C
6 0.210 -0.53200 6.25 7.00 mz B
#Subset根据您的情况
df[which((df$LabPH > 0 & df$LabPH < 6) & df$QR != "C"),]
# output
SO4 PO4 LabConductivity LabPH Notes QR
1 0.131 0.00100 3.98 5.25 dmz B
2 0.109 0.00126 3.54 5.27 mz B
3 0.219 -0.56560 6.28 5.23 <NA> A