在R图中排序x轴

时间:2016-07-20 18:00:11

标签: r graph

我有一个看起来像这样的data.frame:

          gvs order           labels
1  -2.3321916     1           Adygei
2  -1.4996229     5           Basque
3   1.7958170    15           French
4   2.5543214    19          Italian
5  -2.7758460    33         Orcadian
6  -1.9659984    39          Russian
7   2.1239768    41        Sardinian
8  -1.8515908    47           Tuscan
9  -1.5597359     6          Bedouin
10 -1.2534511    14            Druze
11 -0.1625003    31         Mozabite
12 -1.0265275    35      Palestinian
13 -0.8519079     2          Balochi
14 -2.4279528     8           Brahui
15 -3.1717421     9          Burusho
16 -0.9258497    17           Hazara
17 -1.2207974    21           Kalash
18 -1.0325107    24          Makrani
19 -3.2102686    37           Pathan
20 -0.9377928    43           Sindhi
21 -1.7657017    48           Uygurf
22 -0.5058627    10        Cambodian
23 -0.7819299    12              Dai
24 -1.4095947    13             Daur
25  2.2810477    16              Han
26 -0.9007551    18           Hezhen
27  2.6614486    20         Japanese
28 -0.9441980    23             Lahu
29 -0.7237586    29             Miao
30 -0.9452944    30          Mongola
31 -1.2035258    32             Naxi
32 -0.7703779    34           Oroqen
33 -3.0895998    42              She
34 -0.7037952    45               Tu
35 -1.9311354    46            Tujia
36 -0.5423822    49             Xibo
37 -1.6244801    50            Yakut
38 -0.9049735    51               Yi
39 -2.6491331    11        Colombian
40  2.3706977    22        Karitiana
41 -2.7590587    26             Maya
42 -0.9614190    38             Pima
43 -1.6961014    44            Surui
44 -0.8449225    28       Melanesian
45 -1.1163019    36           Papuan
46 -0.9298674     3       BantuKenya
47 -2.8859587     4 BantuSouthAfrica
48 -1.4494841     7       BiakaPygmy
49 -0.7381369    25         Mandenka
50 -0.5644325    27       MbutiPygmy
51 -0.9195156    40              San
52  2.0949378    52           Yoruba

我想按照列gvs的顺序沿着x轴绘制列order,然后沿x轴的每个点的标签都来自列labels。有谁知道这是怎么做的?我希望图表看起来像本文中图5中图表的颜色较少的版本http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412

3 个答案:

答案 0 :(得分:1)

根据您的评论,看起来(1)labelsgvsorder不对应,而(2)如果我按{{对前两列进行排序1}},数据框将被正确排序。如果这不正确,请告诉我。

order排序前两列,仅留下第三列:

order

根据示例数据框中df[,c("gvs","order")] = df[order(df$order), c("gvs","order")] 的当前顺序设置labels的排序:

labels

为区域添加分组变量。每次df$labels = factor(df$labels, levels=df$labels) 的字母排序“向后”时,我都会通过创建一个新组来完成此操作。这里的区域只是数字,但如果你想使用它们,你可以给它们描述性名称:

labels

添加假p值(因为点大小基于您链接到的图表中的p值):

df$group = c(0, cumsum(diff(match(substr(df$labels,1,1), LETTERS)) < 0))

绘制数据,包括每个区域组的不同颜色,基于p值的点大小和围绕点的黑色边界,其中p <1。 0.05。 set.seed(595) df$p.value = runif(nrow(df), 0, 0.5) 添加区域方式:

geom_line

enter image description here

答案 1 :(得分:1)

读取数据框:

df <- data.frame(gvs = c(-2.3321916, -1.4996229, 1.795817, 2.5543214, -2.775846, -1.9659984, 
                      2.1239768, -1.8515908, -1.5597359, -1.2534511, -0.1625003, -1.0265275, 
                      -0.8519079, -2.4279528, -3.1717421, -0.9258497, -1.2207974, -1.0325107, 
                      -3.2102686, -0.9377928, -1.7657017, -0.5058627, -0.7819299, -1.4095947, 
                      2.2810477, -0.9007551, 2.6614486, -0.944198, -0.7237586, -0.9452944, 
                      -1.2035258, -0.7703779, -3.0895998, -0.7037952, -1.9311354, -0.5423822, 
                      -1.6244801, -0.9049735, -2.6491331, 2.3706977, -2.7590587, -0.961419, 
                      -1.6961014, -0.8449225, -1.1163019, -0.9298674, -2.8859587, -1.4494841, 
                      -0.7381369, -0.5644325, -0.9195156, 2.0949378),
             order = c(1L, 5L, 15L, 19L, 33L, 39L, 41L, 47L, 6L, 14L, 31L, 35L, 2L, 
                       8L, 9L, 17L, 21L, 24L, 37L, 43L, 48L, 10L, 12L, 13L, 16L, 18L, 
                       20L, 23L, 29L, 30L, 32L, 34L, 42L, 45L, 46L, 49L, 50L, 51L, 11L, 
                       22L, 26L, 38L, 44L, 28L, 36L, 3L, 4L, 7L, 25L, 27L, 40L, 52L),
             labels = c("Adygei", "Basque", "French", "Italian", "Orcadian", "Russian", 
                        "Sardinian", "Tuscan", "Bedouin", "Druze", "Mozabite", "Palestinian", 
                        "Balochi", "Brahui", "Burusho", "Hazara", "Kalash", "Makrani", 
                        "Pathan", "Sindhi", "Uygurf", "Cambodian", "Dai", "Daur", "Han", 
                        "Hezhen", "Japanese", "Lahu", "Miao", "Mongola", "Naxi", "Oroqen", 
                        "She", "Tu", "Tujia", "Xibo", "Yakut", "Yi", "Colombian", "Karitiana", 
                        "Maya", "Pima", "Surui", "Melanesian", "Papuan", "BantuKenya", 
                        "BantuSouthAfrica", "BiakaPygmy", "Mandenka", "MbutiPygmy", "San", 
                        "Yoruba"))

订单数据

df.ordered <- df[ order(df$order) , ]

还有一些简单(丑陋)的样本绘图,你可以肯定地改进(也许用ggplot)

plot(df.ordered$gvs, pch = 19)
axis(1, at=1:52, labels=df.ordered$labels, las=2)

enter image description here

答案 2 :(得分:1)

不依赖于数据框排序的另一个选项是使用离散比例的limits参数(作为附带好处,可以允许您在绘图时执行更多任意排序。)

df <-read.csv(/path/to/file/df.csv')

xorder <-df[order(df$order),'labels']
ggplot(df, aes(x=labels, y=gvs, size=gvs)) + 
  geom_point() +
  scale_x_discrete(limits=xorder)+
  theme(axis.text.x=element_text(angle=90))