清理数据集,根据不同的列和不同的值将行设置为NA

时间:2016-01-14 21:18:44

标签: r dataframe na

我的数据集如下所示:

我想清理它,以便当“QR”显示C:

时所有行都是NA
    SO4     PO4 LabConductivity  LabPH Notes   QR
1 0.131 0.00100            3.98   5.25   dmz    B
2 0.109 0.00126            3.54   5.27    mz    B
3 0.219 -0.5656            6.28   5.23  <NA>    A
4 0.219 -0.5656            6.28  -5.66  <NA>    C
5 0.219 -0.5656            6.28   5.23  <NA>    C

所以我可以这样做:

mydata[mydata$QR=="C",] <- NA

但是,我想继续为其他变量这样做,例如当LabPH> 6 OR <0时,将整行设置为NA。

如果我再次做同样的事情,我会收到以下警告:

Error in `[<-.data.frame`(`*tmp*`, mydata$LabPH > 5 | mydata$LabPH < 0,  : missing values are not allowed in subscripted assignments of data frames

还有另一种方法吗?那个案例有一个ignoreNA函数吗? 或者有更好的方法吗?

提前非常感谢 干杯 桑德拉

3 个答案:

答案 0 :(得分:1)

您只需在逻辑测试中添加which即可。

例如,

mydata[which(mydata$LabPh > 5.25),] <- NA

答案 1 :(得分:1)

如果您在用于执行逻辑测试的列中有data.frame,则

NA无法进行子集化。例如,您可以看到LabPH = NA的行未进行子集化。

> mydata[mydata$LabPH > 5.25,]
   SO4     PO4 LabConductivity LabPH Notes   QR
   2    0.109 0.00126            3.54  5.27    mz    B
   NA      NA      NA              NA    NA  <NA> <NA>
   NA.1    NA      NA              NA    NA  <NA> <NA>

which有效,因为它排除了LabPH = NA行,另一种方法是使用!is.na()排除NA

> new <- mydata[!is.na(mydata$LabPH)&mydata$LabPH > 5.25,]
> new
    SO4     PO4 LabConductivity LabPH Notes QR
2 0.109 0.00126            3.54  5.27    mz  B

答案 2 :(得分:0)

是不是用NA替换整行与排除数据一样好?如果是这样,考虑到你的条件(QR = "C" and LabPH = between 0 to 6),这是一种方法......

# Please note I added a random 6th row with LabPH = 7.0. 

SO4 = c(0.131,0.109,0.219,0.219,0.219,0.21)
PO4 = c(0.00100,0.00126,-0.5656,-0.5656,-0.5656,-0.532)
LabConductivity = c(3.98, 3.54, 6.28, 6.28, 6.28,6.25)
LabPH = c(5.25,5.27,5.23,-5.66,5.23,7.0)
Notes = c("dmz","mz","<NA>","<NA>","<NA>","mz")
QR = c("B","B","A","C","C","B")

# create a data frame
df = data.frame(SO4,PO4,LabConductivity,LabPH,Notes,QR)
df

    SO4      PO4 LabConductivity LabPH Notes QR
1 0.131  0.00100            3.98  5.25   dmz  B
2 0.109  0.00126            3.54  5.27    mz  B
3 0.219 -0.56560            6.28  5.23  <NA>  A
4 0.219 -0.56560            6.28 -5.66  <NA>  C
5 0.219 -0.56560            6.28  5.23  <NA>  C
6 0.210 -0.53200            6.25  7.00    mz  B

#Subset根据您的情况

df[which((df$LabPH > 0 & df$LabPH < 6) & df$QR != "C"),]
# output
   SO4     PO4    LabConductivity LabPH Notes QR
1 0.131  0.00100            3.98  5.25   dmz  B
2 0.109  0.00126            3.54  5.27    mz  B
3 0.219 -0.56560            6.28  5.23  <NA>  A