使用R和轴break()绘制相当复杂的图表

时间:2012-02-08 23:10:08

标签: r plot ggplot2 lattice

嗨R用户和程序员, 我有一个由4563个氨基酸的蛋白质组成的数据集。使用三种不同的处理和两种不同的氧化剂,该蛋白质中的氨基酸被氧化。我想根据治疗情况在图表中绘制这些氧化的位置。不同的线尺寸将代表不同的氧化剂浓度,线型(虚线和实线)将代表不同类型的氧化剂。我想在每1000个氨基酸处打破轴心。 我用excel和gimp创建了一个类似的模板(这很费时,可能不合适!)。模板中的0.33是行高。 http://dl.dropbox.com/u/58221687/Chakraborty_Figure1.png。 这是数据集: http://dl.dropbox.com/u/58221687/AA-Position-template.xls

提前致谢。 Sourav

1 个答案:

答案 0 :(得分:7)

我将在基本图形中执行此操作,但我确信其他人可以在格子或ggplot2中执行相同或更好的操作。我认为,您需要做的主要事情是轻松地使用您的数据制作这种情节,重新设计并重新考虑数据需要采用何种格式才能进行绘图。如果1)它是长格式的,2)基于颜色,线型,宽度等的变量可用作额外的列,我会用你的数据做到这一点。如果您有这样的数据,那么您可以将其减少为仅包含需要绘制线段的氨基酸。我已经模拟了一个类似于你的数据集。您应该能够修改此代码以适合您的情况: 首先是数据集:

    set.seed(1)
    # make data.frame just with info for the lines you'll actually draw
    # your data was mostly zeros, no need for those lines
    position <- sort(sample(1:4563,45,replace = FALSE))
    # but the x position needs to be shaved down!
    # modulars are the real x positions on the plot:
    xpos <- position%%600
    # line direction appeared in your example but not in your text
    posorneg <- sample(c(-1,1),45,replace = TRUE,prob=c(.05,.95))
    # oxidant concentration for line width- just rescale the oxidant concentration
    # values you have to fall between say .5 and 3, or whatever is nice and visible
    oxconc   <- (.5+runif(45))^2
    # oxidant type determines line type- you mention 2
    # just assign these types to lines types (integers in R)
    oxitype  <- sample(c(1,2),45,replace = TRUE) 
    # let's say there's another dimension you want to map color to
    # in your example png, but not in your description.
    color <- sample(c("green","black","blue"),45,replace=TRUE)

    # and finally, which level does each segment need to belong to?
    # you have 8 line levels in your example png. This works, might take
    # some staring though:
    level <- 0
    for (i in 0:7){
    level[position %in% ((i*600):(i*600+599))] <- 8-i
    }

    # now stick into data.drame:
    AminoData <-data.frame(position = position, xpos = xpos, posorneg = posorneg, 
         oxconc = oxconc, oxitype = oxitype, level = level, color = color)

好的,想象一下,你可以将数据简化为这么简单的事情。绘图中的主要工具(基础)将是segment()。它是矢量化的,因此不需要循环或幻想:

    # now we draw the base plot:
    par(mar=c(3,3,3,3))
    plot(NULL, type = "n", axes = FALSE, xlab = "", ylab = "", 
         ylim =  c(0,9), xlim = c(-10,609))
    # horizontal segments:
    segments(0,1:8,599,1:8,gray(.5))
    # some ticks: (also not pretty)
    segments(rep(c((0:5)*100,599),8), rep(1:8,each=7)-.05, rep(c((0:5)*100,599),8), 
       rep(1:8,each=7)+.05, col=gray(.5))
    # label endpoints:
    text(rep(10,8)+.2,1:8-.2,(7:0)*600,pos=2,cex=.8)
    text(rep(589,8)+.2,1:8-.2,(7:0)*600+599,pos=4,cex=.8)
    # now the amino line segments, remember segments() is vectorized
    segments(AminoData$xpos, AminoData$level, AminoData$xpos, 
       AminoData$level + .5 * AminoData$posorneg, lty = AminoData$oxitype, 
       lwd = AminoData$oxconc, col = as.character(AminoData$color))
    title("mostly you just need to reshape and prepare\nyour data to do this easily in base")

png output from plotting code here

对于某些人的口味来说,这可能过于手工制作,但这是我进行特殊绘图的方式。