我有以下数据框:
stat mTADs DE_genes
5267 -5.452819 chr2:167337500-167447500 chr2:167318145-167341673:+
5268 4.114012 chr6:41532500-41642500 chr6:41555481-41570508:+
5269 9.812369 chr10:18157500-18262500 chr10:18259929-18265882:-
5270 3.371969 chr17:40957500-41062500 chr17:41060000-41071996:-
5271 4.576930 chr17:40957500-41062500 chr17:41012431-41017507:-
5272 2.952151 chr11:72251250-72352500 chr11:72254857-72265270:+
5273 -3.349795 chr1:174307500-174407500 chr1:174405489-174408706:+
5274 -2.685897 chr13:100777500-100877500 chr13:100787949-100874025:-
5275 2.865269 chr13:100777500-100877500 chr13:100718488-100785594:-
5276 6.436959 chr4:150417500-150517500 chr4:150377761-150418774:-
5277 2.622196 chr7:6072500-6162500 chr7:6123828-6142951:+
5278 -5.605531 chr11:48597500-48682500 chr11:48675470-48685185:-
5279 3.554733 chr11:48597500-48682500 chr11:48639642-48665711:+
5280 4.399655 chr11:48597500-48682500 chr11:48638848-48640157:-
如您所见,某些DE_genes属于同一个mTAD。我想为所有DE_genes绘制他们的stat值并用mTAD对它们进行分组。我想这样做是一个水平的条形图,在y轴上有基因,在x轴上有stat值并按TAD分组,但首先我不知道怎么做,其次我认为热图可能更好选项。在R中有没有办法做到这一点?我总共有1700个mTAD,我想看看数据中是否有任何模式。
非常感谢, 迪米瑞斯
答案 0 :(得分:0)
您可以考虑排序的点图而不是条形图。
> thing
ID stat mTADs DE_genes
1 5267 -5.452819 chr2:167337500-167447500 chr2:167318145-167341673:+
2 5268 4.114012 chr6:41532500-41642500 chr6:41555481-41570508:+
3 5269 9.812369 chr10:18157500-18262500 chr10:18259929-18265882:-
4 5270 3.371969 chr17:40957500-41062500 chr17:41060000-41071996:-
5 5271 4.576930 chr17:40957500-41062500 chr17:41012431-41017507:-
6 5272 2.952151 chr11:72251250-72352500 chr11:72254857-72265270:+
7 5273 -3.349795 chr1:174307500-174407500 chr1:174405489-174408706:+
8 5274 -2.685897 chr13:100777500-100877500 chr13:100787949-100874025:-
9 5275 2.865269 chr13:100777500-100877500 chr13:100718488-100785594:-
10 5276 6.436959 chr4:150417500-150517500 chr4:150377761-150418774:-
11 5277 2.622196 chr7:6072500-6162500 chr7:6123828-6142951:+
12 5278 -5.605531 chr11:48597500-48682500 chr11:48675470-48685185:-
13 5279 3.554733 chr11:48597500-48682500 chr11:48639642-48665711:+
14 5280 4.399655 chr11:48597500-48682500 chr11:48638848-48640157:-
首先我们将采用mTADs的中位数。
medians.of.stat.by.mTADs<-aggregate(stat~mTADs,data=thing,FUN=median)
names(medians.of.stat.by.mTADs)[2]<-"median stat for mTAD"
现在将这些中位数与原始数据框合并,并通过对已排序的stat
值进行排序来创建因子。
thing<-merge(thing,medians.of.stat.by.mTADs,all.x = T,by="mTADs")
thing$mTADs.reordered <-factor(thing$mTADs, levels=thing[order(thing$`median stat for mTAD`), "mTADs"])
由于重复级别会抛出警告但似乎有效。
> thing
mTADs ID stat DE_genes median stat for mTAD mTADs.reordered
1 chr1:174307500-174407500 5273 -3.349795 chr1:174405489-174408706:+ -3.349795 chr1:174307500-174407500
2 chr10:18157500-18262500 5269 9.812369 chr10:18259929-18265882:- 9.812369 chr10:18157500-18262500
3 chr11:48597500-48682500 5278 -5.605531 chr11:48675470-48685185:- 3.554733 chr11:48597500-48682500
4 chr11:48597500-48682500 5279 3.554733 chr11:48639642-48665711:+ 3.554733 chr11:48597500-48682500
5 chr11:48597500-48682500 5280 4.399655 chr11:48638848-48640157:- 3.554733 chr11:48597500-48682500
6 chr11:72251250-72352500 5272 2.952151 chr11:72254857-72265270:+ 2.952151 chr11:72251250-72352500
7 chr13:100777500-100877500 5274 -2.685897 chr13:100787949-100874025:- 0.089686 chr13:100777500-100877500
8 chr13:100777500-100877500 5275 2.865269 chr13:100718488-100785594:- 0.089686 chr13:100777500-100877500
9 chr17:40957500-41062500 5270 3.371969 chr17:41060000-41071996:- 3.974449 chr17:40957500-41062500
10 chr17:40957500-41062500 5271 4.576930 chr17:41012431-41017507:- 3.974449 chr17:40957500-41062500
11 chr2:167337500-167447500 5267 -5.452819 chr2:167318145-167341673:+ -5.452819 chr2:167337500-167447500
12 chr4:150417500-150517500 5276 6.436959 chr4:150377761-150418774:- 6.436959 chr4:150417500-150517500
13 chr6:41532500-41642500 5268 4.114012 chr6:41555481-41570508:+ 4.114012 chr6:41532500-41642500
14 chr7:6072500-6162500 5277 2.622196 chr7:6123828-6142951:+ 2.622196 chr7:6072500-6162500
现在做一个简单的圆点图。
ggplot() + geom_point(data=thing, aes(x=stat,y=mTADs.reordered), shape=20, cex=3.3)