如何绘制多变量数据

时间:2016-02-07 09:17:23

标签: r ggplot2

我有一个名为mydf的数据框。我有gene列,其中包含独特的基因名称和searched_(在这些许多个体中搜索过这些基因)和found_(在这些许多个体中找到)这些基因的列。我想绘制一个图表,但不确定在R中最好的方法是什么。我希望看到searched条和found条彼此叠加。这有可能吗?

     mydf<-structure(c("FLT3-TKD", "DNMT3A", "IDH1", "190", "0", "190", 
"5.26315789473684", "NaN", "6.8421052631579", "186", "0", "188", 
"4.83870967741935", "NaN", "7.97872340425532", "123", "0", "123", 
"7.31707317073171", "NaN", "8.13008130081301"), .Dim = c(3L, 
7L), .Dimnames = list(NULL, c("gene", "searched_man", "found_man", 
"searched_cat", "found_cat", "searched_goat", "found_goat")))

1 个答案:

答案 0 :(得分:1)

基础R解决方案:

读入数据

使用你的代码,我最终得到了一个充满字符的矩阵。这不好。

mydf <- as.data.frame(mydf)
mydf[, -1] <- lapply(mydf[, -1], function(x) as.numeric(as.character(x)))
str(mydf)

将其格式化为长格式

mydf2 <- data.frame(gene = mydf$gene,
                    animal = rep(c('man', 'cat', 'goat'), each = nrow(mydf)),
                    searched = unlist(mydf[, seq(2, ncol(mydf) - 1, 2)]),
                    found = unlist(mydf[, seq(3, ncol(mydf), 2)]),
                    row.names = NULL)

给出:

      gene animal searched    found
1 FLT3-TKD    man      190 5.263158
2   DNMT3A    man        0      NaN
3     IDH1    man      190 6.842105
4 FLT3-TKD    cat      186 4.838710
5   DNMT3A    cat        0      NaN
6     IDH1    cat      188 7.978723
7 FLT3-TKD   goat      123 7.317073
8   DNMT3A   goat        0      NaN
9     IDH1   goat      123 8.130081

情节的一个例子是(你没有提供关于你想看到的内容的零信息):

library(ggplot2)
ggplot(mydf2, aes(x = animal, y = found / searched)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~gene)

enter image description here

然后是另一个例子:

mydf2$not_found <- mydf2$searched - mydf2$found
mydf3 <- tidyr::gather(mydf2, 'type', 'val', found:not_found)

ggplot(mydf3, aes(x = animal, y = val, fill = type)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~gene)

enter image description here