我的数据如下:
gene_id logFC logCPM LR PValue FDR
FBgn0000422 -1.875410209 4.429477429 25.16243497 5.27E-07 9.46E-05
FBgn0000422 1.262578335 4.429477429 11.65196417 0.000641348 0.022693702
FBgn0000422 -1.55793362 4.429477429 18.01707407 2.19E-05 0.00235694
FBgn0000565 -1.225082505 6.984450503 22.91546921 1.69E-06 0.000232455
FBgn0000565 -0.989958212 6.984450503 15.45759475 8.44E-05 0.006343374
FBgn0000565 -0.947467121 6.984450503 14.06298678 0.000176789 0.010290503
FBgn0001257 -1.135767061 6.745553159 33.67172953 6.52E-09 2.83E-06
FBgn0001257 -0.806003432 6.745553159 17.36036853 3.09E-05 0.003015214
FBgn0001257 -0.90371115 6.745553159 21.8449115 2.96E-06 0.000523406
FBgn0001291 -0.850144165 5.096971424 42.18504599 8.30E-11 8.08E-08
FBgn0001291 -0.892576562 5.096971424 47.27263627 6.18E-12 2.08E-08
FBgn0001291 -0.629617901 5.096971424 24.12565834 9.02E-07 0.000195886
FBgn0001301 -0.72615833 3.849906562 20.61723199 5.61E-06 0.000634277
FBgn0001301 -0.647614044 3.849906562 16.55276488 4.73E-05 0.004244782
FBgn0001301 -0.700985769 3.849906562 19.62582463 9.42E-06 0.001242629
FBgn0002719 0.39714033 8.153175244 9.467307643 0.002091661 0.045180557
FBgn0002719 -0.566665823 8.153175244 19.77575512 8.71E-06 0.001137708
FBgn0002719 0.509820318 8.153175244 15.96243465 6.46E-05 0.005084696
每个gene_id有3个重复项,我想对重复项进行平均,我可以使用plyr对以下代码进行处理:
AvL_univ_DOD_AVG<-ddply(AvL_univ_DOD,.(gene_id),colwise(mean,c("logFC","logCPM","LR","PValue","FDR")))
然而,我真正想要做的只是每个gene_id的平均值,如果&#34; logFC&#34;中的三个值。在gene_id中具有相同的符号(全部为负或全部为正)。
我不需要保留不符合此标准的那些。
答案 0 :(得分:1)
如果在使用plyr之前过滤掉基因id在列logFC中既没有所有底片也没有所有正面的行? 例如。 with data.table:
library(data.table)
AvL_univ_DOD <- data.table(AvL_univ_DOD)
AvL_univ_DOD[,sign:=logFC>0]
#count how many duplicates you have for each gene_id
AvL_univ_DOD[,number_of_duplicates:=.N,by=gene_id]
#count how many positives you have for each gene_id
AvL_univ_DOD[,number_of_pos:=sum(sign),by=gene_id]
# keep only cases where you have all positives or all negatives
AvL_univ_DOD2 <- AvL_univ_DOD[number_of_pos==0|number_of_pos==number_of_duplicates]
# apply plyr
AvL_univ_DOD_AVG<-ddply(AvL_univ_DOD2,.(gene_id),colwise(mean,c("logFC","logCPM","LR","PValue","FDR")))