在每组中保留最少观察数的行

时间:2019-09-19 19:27:58

标签: r row subset

使用以下数据,我想删除所有包含小于或等于1、1s或2s的行。日期集包含1或2。

mydata
      X1 X2 X3 X4 X5 X6 X7
    1  1  2  2  1  1  2  2
    2  2  2  2  1  2  2  2
    3  1  1  1  1  2  2  2
    4  2  1  2  1  2  2  1
    5  2  1  1  1  1  1  1
    6  1  1  1  1  1  1  1
    7  2  2  2  2  2  2  2

删除行#2、5、6和7,因为

sum(mydata[2,]=="1") #2nd row contains only one 1.
sum(mydata[5,]=="2") #5th row contains only one 2.
sum(mydata[6,]=="2") #6th row contains only no 2.
sum(mydata[7,]=="1") #7th row contains only no 1

感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

d[rowSums(d == 1) > 1 & rowSums(d == 2) > 1,]
#  X1 X2 X3 X4 X5 X6 X7
#1  1  2  2  1  1  2  2
#3  1  1  1  1  2  2  2
#4  2  1  2  1  2  2  1

答案 1 :(得分:2)

一种选择是遍历所有行,获取table并检查所有元素的频率是否大于1(以防每行有更多唯一元素)

mydata[apply(mydata, 1, function(x) all(table(factor(x, levels = 1:2)) >1)),]
#. X1 X2 X3 X4 X5 X6 X7
#1  1  2  2  1  1  2  2
#3  1  1  1  1  2  2  2
#4  2  1  2  1  2  2  1

数据

mydata <- structure(list(X1 = c(1L, 2L, 1L, 2L, 2L, 1L, 2L), X2 = c(2L, 
2L, 1L, 1L, 1L, 1L, 2L), X3 = c(2L, 2L, 1L, 2L, 1L, 1L, 2L), 
    X4 = c(1L, 1L, 1L, 1L, 1L, 1L, 2L), X5 = c(1L, 2L, 2L, 2L, 
    1L, 1L, 2L), X6 = c(2L, 2L, 2L, 2L, 1L, 1L, 2L), X7 = c(2L, 
    2L, 2L, 1L, 1L, 1L, 2L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7"))