在ggplot散点图中着色不同的点集群

时间:2012-12-06 00:36:25

标签: r ggplot2

我正在使用ggplot创建散点图

mydata <- read.table('CF1_deNovoAssembly.csv', sep=",",hader=TRUE)
ggplot(mydata, aes(log(Consensus.length), log(Average.coverage))) + geom_point()

CF1_deNovoAssembly.csv中的数据:

Name    Consensus length    Total read count    Single reads    Reads in pairs  Average coverage
CF1_seqReads contig 1 mapping   81148   77393   45653   31740   68.39
CF1_seqReads contig 2 mapping   5175    4154    2526    1628    57.33
CF1_seqReads contig 3 mapping   43676   43232   25550   17682   70.951
CF1_seqReads contig 4 mapping   33156   28321   16619   11702   61.458
CF1_seqReads contig 5 mapping   194560  158576  93416   65160   58.476
CF1_seqReads contig 6 mapping   26990   27221   16183   11038   72.267
CF1_seqReads contig 7 mapping   35155   34449   20227   14222   70.2
CF1_seqReads contig 8 mapping   110217  111889  65611   46278   73.075
CF1_seqReads contig 9 mapping   96757   87785   51431   36354   65.275
CF1_seqReads contig 10 mapping  169489  155776  91690   64086   65.993
CF1_seqReads contig 11 mapping  280769  215666  126964  88702   55.204
CF1_seqReads contig 12 mapping  29819   30563   17993   12570   73.624
CF1_seqReads contig 13 mapping  120801  116090  68428   47662   69.046
CF1_seqReads contig 14 mapping  172189  154880  91940   62940   64.499
CF1_seqReads contig 15 mapping  105798  88828   52338   36490   60.352
CF1_seqReads contig 16 mapping  212719  200557  117997  82560   67.748
CF1_seqReads contig 17 mapping  36352   29426   17354   12072   57.996
CF1_seqReads contig 18 mapping  1468    2594    1622    972 126.813
CF1_seqReads contig 19 mapping  123801  121038  71234   49804   70.139
CF1_seqReads contig 20 mapping  231369  226726  133732  92994   70.348
CF1_seqReads contig 21 mapping  125419  110004  64774   45230   62.915
CF1_seqReads contig 22 mapping  125818  113356  67034   46322   64.733
CF1_seqReads contig 23 mapping  53872   50388   29824   20564   67.235
CF1_seqReads contig 24 mapping  118273  99252   58798   40454   60.263
CF1_seqReads contig 25 mapping  5569    19834   11758   8076    257.753
CF1_seqReads contig 26 mapping  48830   47879   28265   19614   70.306
CF1_seqReads contig 27 mapping  33566   32370   19280   13090   69.097
CF1_seqReads contig 28 mapping  8357    6684    4046    2638    56.178
CF1_seqReads contig 29 mapping  82328   71998   42670   29328   62.916
CF1_seqReads contig 30 mapping  55288   52415   31023   21392   68.03
CF1_seqReads contig 31 mapping  49849   44216   26142   18074   63.699
CF1_seqReads contig 32 mapping  66991   69598   41202   28396   74.615
CF1_seqReads contig 33 mapping  210958  187922  110992  76930   63.938
CF1_seqReads contig 34 mapping  95028   86002   51080   34922   64.925
CF1_seqReads contig 35 mapping  25219   22685   13567   9118    65.146
CF1_seqReads contig 36 mapping  52506   44863   26493   18370   61.281
CF1_seqReads contig 37 mapping  44807   37939   22745   15194   60.863
CF1_seqReads contig 38 mapping  30091   25919   15355   10564   62.312
CF1_seqReads contig 39 mapping  49730   42295   25445   16850   60.872
CF1_seqReads contig 40 mapping  35166   27239   16101   11138   55.456
CF1_seqReads contig 41 mapping  58239   54831   32311   22520   67.764
CF1_seqReads contig 42 mapping  78398   69994   41578   28416   64.135
CF1_seqReads contig 43 mapping  79163   61667   36637   25030   55.958
CF1_seqReads contig 44 mapping  46179   37621   22479   15142   58.463
CF1_seqReads contig 45 mapping  1501    1209    715 494 55.69
CF1_seqReads contig 46 mapping  35505   36158   21296   14862   73.271
CF1_seqReads contig 47 mapping  108945  100876  59394   41482   66.479
CF1_seqReads contig 48 mapping  36042   30283   17961   12322   60.289
CF1_seqReads contig 49 mapping  125139  102821  60441   42380   59.021
CF1_seqReads contig 50 mapping  33093   31998   18976   13022   69.715
CF1_seqReads contig 51 mapping  19399   14764   8826    5938    54.607
CF1_seqReads contig 52 mapping  39627   30320   17856   12464   54.848
CF1_seqReads contig 53 mapping  12163   9861    5887    3974    58.008
CF1_seqReads contig 54 mapping  4378    3872    2442    1430    62.841
CF1_seqReads contig 55 mapping  107763  96191   56993   39198   64.165
CF1_seqReads contig 56 mapping  167629  143032  84032   59000   61.441
CF1_seqReads contig 57 mapping  97622   80176   47622   32554   58.829
CF1_seqReads contig 58 mapping  56912   56028   32850   23178   70.506
CF1_seqReads contig 59 mapping  15390   16360   9792    6568    76.745
CF1_seqReads contig 60 mapping  80202   71909   42337   29572   64.292
CF1_seqReads contig 61 mapping  45435   39732   23290   16442   62.592
CF1_seqReads contig 62 mapping  17972   15752   9208    6544    63.102
CF1_seqReads contig 63 mapping  41256   40603   23859   16744   70.545
CF1_seqReads contig 64 mapping  110461  93608   54796   38812   60.845
CF1_seqReads contig 65 mapping  62066   53798   31662   22136   62.125
CF1_seqReads contig 66 mapping  1981    1788    1112    676 63.459
CF1_seqReads contig 67 mapping  32249   28939   17121   11818   64.486
CF1_seqReads contig 68 mapping  30129   30299   17873   12426   72.002
CF1_seqReads contig 69 mapping  73494   70081   41307   28774   68.502
CF1_seqReads contig 70 mapping  42147   32350   19106   13244   54.965
CF1_seqReads contig 71 mapping  15109   14803   8827    5976    70.037
CF1_seqReads contig 72 mapping  19446   17197   10277   6920    63.506
CF1_seqReads contig 73 mapping  1203    2160    1410    750 127.011
CF1_seqReads contig 74 mapping  35575   31557   18907   12650   63.833
CF1_seqReads contig 75 mapping  61658   52593   31031   21562   61.218
CF1_seqReads contig 76 mapping  2104    2063    1335    728 69.914
CF1_seqReads contig 77 mapping  58182   49734   29348   20386   61.311
CF1_seqReads contig 78 mapping  55182   54095   32319   21776   70.398
CF1_seqReads contig 79 mapping  35523   34002   19964   14038   68.577
CF1_seqReads contig 80 mapping  5174    8766    5222    3544    119.842
CF1_seqReads contig 81 mapping  69777   59263   35069   24194   60.855
CF1_seqReads contig 82 mapping  23575   21660   12872   8788    65.608
CF1_seqReads contig 83 mapping  3065    2609    1597    1012    61.1
CF1_seqReads contig 84 mapping  332 803 619 184 171.226
CF1_seqReads contig 85 mapping  5538    5060    3028    2032    63.651
CF1_seqReads contig 86 mapping  18727   16636   9814    6822    63.747
CF1_seqReads contig 87 mapping  27818   21227   12585   8642    54.79
CF1_seqReads contig 88 mapping  20439   17310   10266   7044    60.577
CF1_seqReads contig 89 mapping  14937   13026   7656    5370    62.693
CF1_seqReads contig 90 mapping  17570   16529   9787    6742    67.656
CF1_seqReads contig 91 mapping  7927    7372    4374    2998    66.942
CF1_seqReads contig 92 mapping  2695    5155    3143    2012    136
CF1_seqReads contig 93 mapping  28431   22662   13382   9280    57.128
CF1_seqReads contig 94 mapping  10910   8378    5032    3346    54.889
CF1_seqReads contig 95 mapping  11426   11337   6863    4474    70.898
CF1_seqReads contig 96 mapping  39433   36586   21812   14774   66.563
CF1_seqReads contig 97 mapping  65815   66239   39289   26950   72.083
CF1_seqReads contig 98 mapping  11296   11627   6991    4636    73.84
CF1_seqReads contig 99 mapping  27785   22040   13130   8910    56.893
CF1_seqReads contig 100 mapping 26131   20073   11793   8280    55.234
CF1_seqReads contig 101 mapping 825 766 560 206 61.246
CF1_seqReads contig 102 mapping 25869   25524   15286   10238   70.695
CF1_seqReads contig 103 mapping 7747    7244    4356    2888    66.154
CF1_seqReads contig 104 mapping 34292   28755   16913   11842   60.05
CF1_seqReads contig 105 mapping 17219   16000   9346    6654    66.858
CF1_seqReads contig 106 mapping 39990   34798   20590   14208   62.384
CF1_seqReads contig 107 mapping 38227   33283   19721   13562   62.381
CF1_seqReads contig 108 mapping 1825    1439    919 520 54.89
CF1_seqReads contig 109 mapping 5333    4212    2494    1718    57.046
CF1_seqReads contig 110 mapping 13827   11248   6582    4666    58.276
CF1_seqReads contig 111 mapping 25486   22477   13277   9200    63.393
CF1_seqReads contig 112 mapping 15592   13751   8295    5456    63.048
CF1_seqReads contig 113 mapping 6230    4864    2986    1878    55.995
CF1_seqReads contig 114 mapping 28229   22164   13150   9014    56.051
CF1_seqReads contig 115 mapping 92951   92630   54674   37956   71.557
CF1_seqReads contig 116 mapping 24347   24204   14532   9672    71.386
CF1_seqReads contig 117 mapping 11556   11295   6657    4638    70.199
CF1_seqReads contig 118 mapping 2750    2553    1683    870 64.722
CF1_seqReads contig 119 mapping 19046   14586   8706    5880    54.681
CF1_seqReads contig 120 mapping 19966   17390   10290   7100    62.622
CF1_seqReads contig 121 mapping 1912    1657    1011    646 62.048
CF1_seqReads contig 122 mapping 1236    5497    3435    2062    318.75
CF1_seqReads contig 123 mapping 1136    852 584 268 53.619
CF1_seqReads contig 124 mapping 414 391 273 118 62.2
CF1_seqReads contig 125 mapping 912 931 619 312 72.031
CF1_seqReads contig 126 mapping 915 588 408 180 43.635
CF1_seqReads contig 127 mapping 2039    1853    1165    688 64.089
CF1_seqReads contig 128 mapping 1471    1253    837 416 58.997
CF1_seqReads contig 129 mapping 1148    2382    1560    822 147.665
CF1_seqReads contig 130 mapping 23233   23367   14443   8924    71.842
CF1_seqReads contig 131 mapping 702 472 324 148 47.107
CF1_seqReads contig 132 mapping 855 1461    967 494 120.706
CF1_seqReads contig 133 mapping 461 1027    725 302 157.434
CF1_seqReads contig 134 mapping 1136    834 580 254 52.482
CF1_seqReads contig 135 mapping 1222    1681    1131    550 98.43
CF1_seqReads contig 136 mapping 1316    997 689 308 53.191
CF1_seqReads contig 137 mapping 1923    1880    1204    676 68.222
CF1_seqReads contig 138 mapping 903 601 401 200 47.503
CF1_seqReads contig 139 mapping 604 495 367 128 56.925
CF1_seqReads contig 140 mapping 1854    1651    1081    570 62.929
CF1_seqReads contig 141 mapping 857 1666    1114    552 137.351
CF1_seqReads contig 142 mapping 273 264 214 50  65.048
CF1_seqReads contig 143 mapping 1848    1254    826 428 47.48
CF1_seqReads contig 144 mapping 9112    8829    5223    3606    69.287
CF1_seqReads contig 145 mapping 4959    8350    5042    3308    120.352
CF1_seqReads contig 146 mapping 1160    2386    1570    816 147.567
CF1_seqReads contig 147 mapping 3398    2919    1807    1112    59.74
CF1_seqReads contig 148 mapping 513 491 381 110 65.774
CF1_seqReads contig 149 mapping 2634    2644    1594    1050    71.279
CF1_seqReads contig 150 mapping 2333    1832    1086    746 54.456
CF1_seqReads contig 151 mapping 9929    8130    4910    3220    58.649
CF1_seqReads contig 152 mapping 4867    4591    2765    1826    66.831
CF1_seqReads contig 153 mapping 2244    1984    1278    706 61.906
CF1_seqReads contig 154 mapping 3008    2557    1581    976 61.333
CF1_seqReads contig 155 mapping 553 1015    733 282 130.448
CF1_seqReads contig 156 mapping 735 974 662 312 91.188
CF1_seqReads contig 157 mapping 1375    2157    1507    650 110.765
CF1_seqReads contig 158 mapping 211 168 160 8   54.796
CF1_seqReads contig 159 mapping 211 174 160 14  56.749
CF1_seqReads contig 160 mapping 3076    3113    1855    1258    73.188
CF1_seqReads contig 161 mapping 1965    1474    998 476 51.869
CF1_seqReads contig 162 mapping 2495    2055    1301    754 57.74
CF1_seqReads contig 163 mapping 230 201 183 18  59.178
CF1_seqReads contig 164 mapping 899 1786    1176    610 140.673
CF1_seqReads contig 165 mapping 3860    2683    1643    1040    49.358
CF1_seqReads contig 166 mapping 1207    1064    642 422 62.839
CF1_seqReads contig 167 mapping 6068    5769    3555    2214    67.996
CF1_seqReads contig 168 mapping 1345    980 628 352 51.059
CF1_seqReads contig 169 mapping 2407    2119    1233    886 62.073
CF1_seqReads contig 170 mapping 236 409 359 50  119.915
CF1_seqReads contig 171 mapping 2288    1959    1229    730 61.018
CF1_seqReads contig 172 mapping 1214    715 497 218 40.74
CF1_seqReads contig 173 mapping 323 531 431 100 113.607
CF1_seqReads contig 174 mapping 1222    789 529 260 44.583
CF1_seqReads contig 175 mapping 207 188 182 6   61.063
CF1_seqReads contig 176 mapping 2236    2204    1392    812 70.699
CF1_seqReads contig 177 mapping 1173    1189    901 288 70.116
CF1_seqReads contig 178 mapping 757 692 476 216 62.54
CF1_seqReads contig 179 mapping 238 485 413 72  137.378
CF1_seqReads contig 180 mapping 1122    984 670 314 62.156
CF1_seqReads contig 181 mapping 1717    1305    819 486 53.286
CF1_seqReads contig 182 mapping 739 1061    825 236 101.298
CF1_seqReads contig 183 mapping 377 293 231 62  54.255
CF1_seqReads contig 184 mapping 878 837 589 248 67.145
CF1_seqReads contig 185 mapping 905 786 540 246 60.841
CF1_seqReads contig 186 mapping 321 223 189 34  44.969
CF1_seqReads contig 187 mapping 215 251 221 30  77.498
CF1_seqReads contig 188 mapping 1153    1074    718 356 64.892
CF1_seqReads contig 189 mapping 568 441 303 138 53.771
CF1_seqReads contig 190 mapping 582 450 282 168 54.89
CF1_seqReads contig 191 mapping 452 767 585 182 119.653
CF1_seqReads contig 192 mapping 263 218 186 32  58.73
CF1_seqReads contig 193 mapping 313 247 193 54  54.22
CF1_seqReads contig 194 mapping 295 214 174 40  48.346
CF1_seqReads contig 195 mapping 297 197 145 52  47.007
CF1_seqReads contig 196 mapping 346 230 180 50  42.566
CF1_seqReads contig 197 mapping 392 226 180 46  37.457
CF1_seqReads contig 198 mapping 208 168 150 18  53.255
CF1_seqReads contig 199 mapping 660 586 398 188 62.903
CF1_seqReads contig 200 mapping 276 300 250 50  72.681
CF1_seqReads contig 201 mapping 388 269 231 38  45.611
CF1_seqReads contig 202 mapping 353 343 245 98  67.042
CF1_seqReads contig 203 mapping 284 175 139 36  42.144

并且看着y轴我可以注意到有3组点。

enter image description here

是否有算法在不使用max和/或min y值的情况下识别每个组?

enter image description here

1 个答案:

答案 0 :(得分:4)

如果您想使用某些预设值对y进行分组,那么您可以使用cut

可重现的例子

set.seed(07122012)
DF <- data.frame(y= runif(100), x  = rnorm(100))

# grouping at 0.33 / 0.66

mygroups <-  seq(0,1,l=4)

ggplot(DF, aes(x=x,y=y)) + geom_point(aes(colour= cut(y,breaks = mygroups))) +
  scale_colour_brewer('My groups', palette = 'Set2')

enter image description here

或者你可以做一些简单的聚类(可能是x和y上scalekmeans的组合)

ggplot(DF, aes(x=x,y=y)) + 
  geom_point(aes(colour= factor(kmeans(scale(cbind(x,y)), centers=3)$cluster))) +
  scale_colour_brewer('My groups', palette = 'Set2')

enter image description here