如何删除r中不超过3个值的行?

时间:2017-05-24 20:20:43

标签: r row

这是我第一次提出问题,希望我能得到你的帮助! 我需要使用R

删除只有一个或两个基因值的行

enter image description here

基本上我需要摆脱50S,ABCC8和ACAT1,因为它们的n <3。

我想要的输出是

非常感谢你!

3 个答案:

答案 0 :(得分:2)

如果这是在data.frame中,您可以使用dplyr包进行一些操作。我们可以按Genescount对数据进行分组。然后我们只需设置过滤条件即可删除记录。

require(dplyr)

df <- data.frame(
  Genes=c('50S'   ,'abcb1' ,'abcb1' ,'abcb1' ,'ABCC8' ,'ABL'   ,'ABL'   ,'ABL'   ,'ABL'   ,'ACAT1' ,'ACAT1' ),
  Values=c(-0.627323448, -0.226358414, 0.347305901 ,0.371632631 ,0.099485307 ,0.078512979 ,-0.426643782, -1.060270668, -2.059157991, 0.608899174 ,-0.048795611)
)

#group, filter and join back to get subset the data
df %>% group_by(Genes) 
  %>% summarize(count=n()) 
  %>% filter(count>=3) 
  %>% inner_join(df) 
  %>% select(Genes,Values)

根据@ Lamia的评论,可以将其简化为:

df %>% group_by(Genes) %>% filter(n()>=3) 

答案 1 :(得分:0)

# generating data
x <- c(NA, NA, NA, NA, 2, 3) # has n < 3!
y <- c(1, 2, 3, 4, 5, 6)
z <- c(1 ,2, 3, NA, 5, 6)
df <- data.frame(x,y,z)

colsToKeep <- c() # making empty vector I will fill with column numbers
for (i in 1:ncol(df)) { # for every column
  if (sum(!is.na(df[,i]))>=3) { # if that column has greater than 3 valid values (i.e., ones that are not na...
colsToKeep <- c(colsToKeep, i) # then save that column number into this vector
  }
}

df[,colsToKeep] # then use that vector to call the columns you want

请注意,R将FALSE视为0,将TRUE视为1,这就是sum()函数在此处的工作方式。

答案 2 :(得分:0)

使用table

的另一种可能的解决方案
gene <- c("A","A","A","B","B","C","C","C","C","D")
value <- c(seq(1,10,1))
df<-data.frame(gene,value)
df
  gene value
1    A     1
2    A     2
3    A     3
6    C     6
7    C     7
8    C     8
9    C     9

su<-data.frame(table(df$gene))
df_keep <-df[which(df$gene %in% su[which(su$Freq>2),1]),]
df_keep
  gene value
1    A     1
2    A     2
3    A     3
6    C     6
7    C     7
8    C     8
9    C     9