我有一个看起来像这样的data.frame:
gvs order labels
1 -2.3321916 1 Adygei
2 -1.4996229 5 Basque
3 1.7958170 15 French
4 2.5543214 19 Italian
5 -2.7758460 33 Orcadian
6 -1.9659984 39 Russian
7 2.1239768 41 Sardinian
8 -1.8515908 47 Tuscan
9 -1.5597359 6 Bedouin
10 -1.2534511 14 Druze
11 -0.1625003 31 Mozabite
12 -1.0265275 35 Palestinian
13 -0.8519079 2 Balochi
14 -2.4279528 8 Brahui
15 -3.1717421 9 Burusho
16 -0.9258497 17 Hazara
17 -1.2207974 21 Kalash
18 -1.0325107 24 Makrani
19 -3.2102686 37 Pathan
20 -0.9377928 43 Sindhi
21 -1.7657017 48 Uygurf
22 -0.5058627 10 Cambodian
23 -0.7819299 12 Dai
24 -1.4095947 13 Daur
25 2.2810477 16 Han
26 -0.9007551 18 Hezhen
27 2.6614486 20 Japanese
28 -0.9441980 23 Lahu
29 -0.7237586 29 Miao
30 -0.9452944 30 Mongola
31 -1.2035258 32 Naxi
32 -0.7703779 34 Oroqen
33 -3.0895998 42 She
34 -0.7037952 45 Tu
35 -1.9311354 46 Tujia
36 -0.5423822 49 Xibo
37 -1.6244801 50 Yakut
38 -0.9049735 51 Yi
39 -2.6491331 11 Colombian
40 2.3706977 22 Karitiana
41 -2.7590587 26 Maya
42 -0.9614190 38 Pima
43 -1.6961014 44 Surui
44 -0.8449225 28 Melanesian
45 -1.1163019 36 Papuan
46 -0.9298674 3 BantuKenya
47 -2.8859587 4 BantuSouthAfrica
48 -1.4494841 7 BiakaPygmy
49 -0.7381369 25 Mandenka
50 -0.5644325 27 MbutiPygmy
51 -0.9195156 40 San
52 2.0949378 52 Yoruba
我想按照列gvs
的顺序沿着x轴绘制列order
,然后沿x轴的每个点的标签都来自列labels
。有谁知道这是怎么做的?我希望图表看起来像本文中图5中图表的颜色较少的版本http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412
答案 0 :(得分:1)
根据您的评论,看起来(1)labels
与gvs
和order
不对应,而(2)如果我按{{对前两列进行排序1}},数据框将被正确排序。如果这不正确,请告诉我。
按order
排序前两列,仅留下第三列:
order
根据示例数据框中df[,c("gvs","order")] = df[order(df$order), c("gvs","order")]
的当前顺序设置labels
的排序:
labels
为区域添加分组变量。每次df$labels = factor(df$labels, levels=df$labels)
的字母排序“向后”时,我都会通过创建一个新组来完成此操作。这里的区域只是数字,但如果你想使用它们,你可以给它们描述性名称:
labels
添加假p值(因为点大小基于您链接到的图表中的p值):
df$group = c(0, cumsum(diff(match(substr(df$labels,1,1), LETTERS)) < 0))
绘制数据,包括每个区域组的不同颜色,基于p值的点大小和围绕点的黑色边界,其中p <1。 0.05。 set.seed(595)
df$p.value = runif(nrow(df), 0, 0.5)
添加区域方式:
geom_line
答案 1 :(得分:1)
读取数据框:
df <- data.frame(gvs = c(-2.3321916, -1.4996229, 1.795817, 2.5543214, -2.775846, -1.9659984,
2.1239768, -1.8515908, -1.5597359, -1.2534511, -0.1625003, -1.0265275,
-0.8519079, -2.4279528, -3.1717421, -0.9258497, -1.2207974, -1.0325107,
-3.2102686, -0.9377928, -1.7657017, -0.5058627, -0.7819299, -1.4095947,
2.2810477, -0.9007551, 2.6614486, -0.944198, -0.7237586, -0.9452944,
-1.2035258, -0.7703779, -3.0895998, -0.7037952, -1.9311354, -0.5423822,
-1.6244801, -0.9049735, -2.6491331, 2.3706977, -2.7590587, -0.961419,
-1.6961014, -0.8449225, -1.1163019, -0.9298674, -2.8859587, -1.4494841,
-0.7381369, -0.5644325, -0.9195156, 2.0949378),
order = c(1L, 5L, 15L, 19L, 33L, 39L, 41L, 47L, 6L, 14L, 31L, 35L, 2L,
8L, 9L, 17L, 21L, 24L, 37L, 43L, 48L, 10L, 12L, 13L, 16L, 18L,
20L, 23L, 29L, 30L, 32L, 34L, 42L, 45L, 46L, 49L, 50L, 51L, 11L,
22L, 26L, 38L, 44L, 28L, 36L, 3L, 4L, 7L, 25L, 27L, 40L, 52L),
labels = c("Adygei", "Basque", "French", "Italian", "Orcadian", "Russian",
"Sardinian", "Tuscan", "Bedouin", "Druze", "Mozabite", "Palestinian",
"Balochi", "Brahui", "Burusho", "Hazara", "Kalash", "Makrani",
"Pathan", "Sindhi", "Uygurf", "Cambodian", "Dai", "Daur", "Han",
"Hezhen", "Japanese", "Lahu", "Miao", "Mongola", "Naxi", "Oroqen",
"She", "Tu", "Tujia", "Xibo", "Yakut", "Yi", "Colombian", "Karitiana",
"Maya", "Pima", "Surui", "Melanesian", "Papuan", "BantuKenya",
"BantuSouthAfrica", "BiakaPygmy", "Mandenka", "MbutiPygmy", "San",
"Yoruba"))
订单数据
df.ordered <- df[ order(df$order) , ]
还有一些简单(丑陋)的样本绘图,你可以肯定地改进(也许用ggplot)
plot(df.ordered$gvs, pch = 19)
axis(1, at=1:52, labels=df.ordered$labels, las=2)
答案 2 :(得分:1)
不依赖于数据框排序的另一个选项是使用离散比例的limits
参数(作为附带好处,可以允许您在绘图时执行更多任意排序。)
df <-read.csv(/path/to/file/df.csv')
xorder <-df[order(df$order),'labels']
ggplot(df, aes(x=labels, y=gvs, size=gvs)) +
geom_point() +
scale_x_discrete(limits=xorder)+
theme(axis.text.x=element_text(angle=90))