Question

这是我第一次提出问题，希望我能得到你的帮助！我需要使用R

删除只有一个或两个基因值的行

基本上我需要摆脱50S，ABCC8和ACAT1，因为它们的n <3。

我想要的输出是

非常感谢你！

Answer 1

如果这是在data.frame中，您可以使用dplyr包进行一些操作。我们可以按Genes和count对数据进行分组。然后我们只需设置过滤条件即可删除记录。

require(dplyr)

df <- data.frame(
  Genes=c('50S'   ,'abcb1' ,'abcb1' ,'abcb1' ,'ABCC8' ,'ABL'   ,'ABL'   ,'ABL'   ,'ABL'   ,'ACAT1' ,'ACAT1' ),
  Values=c(-0.627323448, -0.226358414, 0.347305901 ,0.371632631 ,0.099485307 ,0.078512979 ,-0.426643782, -1.060270668, -2.059157991, 0.608899174 ,-0.048795611)
)

#group, filter and join back to get subset the data
df %>% group_by(Genes) 
  %>% summarize(count=n()) 
  %>% filter(count>=3) 
  %>% inner_join(df) 
  %>% select(Genes,Values)

根据@ Lamia的评论，可以将其简化为：

df %>% group_by(Genes) %>% filter(n()>=3)

Answer 2

# generating data
x <- c(NA, NA, NA, NA, 2, 3) # has n < 3!
y <- c(1, 2, 3, 4, 5, 6)
z <- c(1 ,2, 3, NA, 5, 6)
df <- data.frame(x,y,z)

colsToKeep <- c() # making empty vector I will fill with column numbers
for (i in 1:ncol(df)) { # for every column
  if (sum(!is.na(df[,i]))>=3) { # if that column has greater than 3 valid values (i.e., ones that are not na...
colsToKeep <- c(colsToKeep, i) # then save that column number into this vector
  }
}

df[,colsToKeep] # then use that vector to call the columns you want

请注意，R将FALSE视为0，将TRUE视为1，这就是sum()函数在此处的工作方式。

Answer 3

使用table：

的另一种可能的解决方案

gene <- c("A","A","A","B","B","C","C","C","C","D")
value <- c(seq(1,10,1))
df<-data.frame(gene,value)
df
  gene value
1    A     1
2    A     2
3    A     3
6    C     6
7    C     7
8    C     8
9    C     9

su<-data.frame(table(df$gene))
df_keep <-df[which(df$gene %in% su[which(su$Freq>2),1]),]
df_keep
  gene value
1    A     1
2    A     2
3    A     3
6    C     6
7    C     7
8    C     8
9    C     9

如何删除r中不超过3个值的行？

3 个答案: