答案 0 :(得分:2)
如果这是在data.frame中,您可以使用dplyr
包进行一些操作。我们可以按Genes
和count
对数据进行分组。然后我们只需设置过滤条件即可删除记录。
require(dplyr)
df <- data.frame(
Genes=c('50S' ,'abcb1' ,'abcb1' ,'abcb1' ,'ABCC8' ,'ABL' ,'ABL' ,'ABL' ,'ABL' ,'ACAT1' ,'ACAT1' ),
Values=c(-0.627323448, -0.226358414, 0.347305901 ,0.371632631 ,0.099485307 ,0.078512979 ,-0.426643782, -1.060270668, -2.059157991, 0.608899174 ,-0.048795611)
)
#group, filter and join back to get subset the data
df %>% group_by(Genes)
%>% summarize(count=n())
%>% filter(count>=3)
%>% inner_join(df)
%>% select(Genes,Values)
根据@ Lamia的评论,可以将其简化为:
df %>% group_by(Genes) %>% filter(n()>=3)
答案 1 :(得分:0)
# generating data
x <- c(NA, NA, NA, NA, 2, 3) # has n < 3!
y <- c(1, 2, 3, 4, 5, 6)
z <- c(1 ,2, 3, NA, 5, 6)
df <- data.frame(x,y,z)
colsToKeep <- c() # making empty vector I will fill with column numbers
for (i in 1:ncol(df)) { # for every column
if (sum(!is.na(df[,i]))>=3) { # if that column has greater than 3 valid values (i.e., ones that are not na...
colsToKeep <- c(colsToKeep, i) # then save that column number into this vector
}
}
df[,colsToKeep] # then use that vector to call the columns you want
请注意,R将FALSE视为0,将TRUE视为1,这就是sum()
函数在此处的工作方式。
答案 2 :(得分:0)
使用table
:
gene <- c("A","A","A","B","B","C","C","C","C","D")
value <- c(seq(1,10,1))
df<-data.frame(gene,value)
df
gene value
1 A 1
2 A 2
3 A 3
6 C 6
7 C 7
8 C 8
9 C 9
su<-data.frame(table(df$gene))
df_keep <-df[which(df$gene %in% su[which(su$Freq>2),1]),]
df_keep
gene value
1 A 1
2 A 2
3 A 3
6 C 6
7 C 7
8 C 8
9 C 9