在R中绘制核苷酸链

时间:2018-10-31 04:34:12

标签: r ggplot2

我有兴趣在R中绘制此示例图。示例图是在Illustrator中生成的。 enter image description here

基本上,我的数据的结构如下:

> dput(data)
structure(list(FirstPos = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("pos1", 
"pos2"), class = "factor"), SecondPos = structure(c(1L, 1L, 1L, 
2L, 2L, 2L), .Label = c("pos2", "pos3"), class = "factor"), FirstPosseq = structure(c(1L, 
1L, 1L, 2L, 3L, 3L), .Label = c("A", "C", "T"), class = "factor"), 
    SecondPosseq = structure(c(2L, 4L, 1L, 1L, 3L, 4L), .Label = c("A", 
    "C", "G", "T"), class = "factor"), Count = c(10L, 100L, 1L, 
    100L, 100L, 100L)), .Names = c("FirstPos", "SecondPos", "FirstPosseq", 
"SecondPosseq", "Count"), class = "data.frame", row.names = c(NA, 
-6L))

这是职位列表(原始职位和合作伙伴职位)。对于每一行,“计数”列表示2个核苷酸同时出现的可能性。我想要一种显示该概率和阶数(在x轴上)的方法。在示例中,我尝试根据“ Count”来改变线宽。

在ggplot2库中浏览时,我找不到这样的数字,希望能就我可以使用的潜在软件包/方式获得您的建议。

谢谢!

1 个答案:

答案 0 :(得分:0)

一种可能的解决方案是使用igraph包。以下是有关如何开始使用数据集的基本示例。

# Assign your data to variable 'dat'.
dat = structure(list(FirstPos = structure(c(1L, 1L, 1L, 2L, 2L, 2L), 
          .Label = c("pos1", "pos2"), class = "factor"), 
          SecondPos = structure(c(1L, 1L, 1L, 2L, 2L, 2L), 
          .Label = c("pos2", "pos3"), class = "factor"), 
          FirstPosseq = structure(c(1L, 1L, 1L, 2L, 3L, 3L), 
          .Label = c("A", "C", "T"), class = "factor"), 
          SecondPosseq = structure(c(2L, 4L, 1L, 1L, 3L, 4L), 
          .Label = c("A", "C", "G", "T"), class = "factor"), 
          Count = c(10L, 100L, 1L, 100L, 100L, 100L)), 
          .Names = c("FirstPos", "SecondPos", "FirstPosseq", 
          "SecondPosseq", "Count"), class = "data.frame", 
          row.names = c(NA, -6L))

library(igraph)

# Create unique names/ids for each vertex in the graph.
dat$node1 = paste(dat$FirstPos, dat$FirstPosseq, sep="_")
dat$node2 = paste(dat$SecondPos, dat$SecondPosseq, sep="_")

# Use last two column of data as an edge list matrix, create graph.
g = graph_from_edgelist(as.matrix(dat[, c(6, 7)]))

# Add edge weights to graph.
E(g)$weight = dat$Count

# Plot using 'layout_as_tree' to control layout.
plot(g, layout=layout_as_tree(g, root=1), edge.width=log10(E(g)$weight + 1) * 5, 
     vertex.size=30, vertex.color="white", edge.color="black", 
     edge.arrow.mode=0L, vertex.label.color="black")

enter image description here