如何获取数据列表的堆栈条形图,其中保留或删除重复的行?

时间:2017-01-02 17:09:40

标签: r dataframe ggplot2

我有需要按阈值分类的data.frame列表,最后需要按文件栏的不同类别获取堆栈条形图。但是,在我的data.frame列表中,有些行是重复的,我需要在某些绘图中显示这些重复的行,但是这些重复的行也应该被删除并显示另一个绘图。因为,保留,删除不同类别中的这些重复行,可以提供不同的洞察力来理解结果。根据堆栈条图的名称,我打算保留并删除某些类别中的这些重复行。我有点难以按照我的意愿得到预期的情节。任何人都可以指出如何轻松实现这一目标吗?如何准备情节数据以获得满足我需求的情节?任何的想法 ?

可重复的data.frame:

Qualified <- list(
    hotan = data.frame( begin=c(7,13,19,25,31,37,43,49,55,67,79,103,31,49,55,67), 
                        end=  c(10,16,22,28,34,40,46,52,58,70,82,106,34,52,58,70), 
                        pos.score=c(11,19,8,2,6,14,25,10,23,28,15,17,6,10,23,28)),
    aksu = data.frame( begin=c(12,21,30,39,48,57,66,84,111,30,48,66,84), 
                       end=  c(15,24,33,42,51,60,69,87,114,33,51,69,87), 
                       pos.score=c(5,11,15,23,9,13,2,10,16,15,9,2,10)),
    korla = data.frame( begin=c(6,14,22,30,38,46,54,62,70,78,6,30,46,70), 
                        end=c(11,19,27,35,43,51,59,67,75,83,11,35,51,75), 
                        pos.score=c(9,16,12,3,20,7,11,13,14,17,9,3,7,14))
)

unQualified <- list(
    hotan = data.frame( begin=c(21,33,57,69,81,117,129,177,225,249,333,345,33,81,333), 
                        end=  c(26,38,62,74,86,122,134,182,230,254,338,350,38,86,338), 
                        pos.score=c(7,34,29,14,23,20,11,30,19,17,6,4,34,23,6)),
    aksu = data.frame( begin=c(13,23,33,43,53,63,73,93,113,123,143,153,183,33,63,143), 
                       end=  c(19,29,39,49,59,69,79,99,119,129,149,159,189,39,69,149), 
                       pos.score=c(5,13,32,28,9,11,22,12,23,3,6,8,16,32,11,6)),
    korla = data.frame( begin=c(23,34,45,56,67,78,89,122,133,144,166,188,56,89,144), 
                        end=c(31,42,53,64,75,86,97,130,141,152,174,196,64,97,152), 
                        pos.score=c(3,10,19,17,21,8,18,14,4,9,12,22,17,18,9))
)

修改

我确实以这种方式对数据进行了分类:

singleDF <- 
    bind_rows(c(Qualified = Qualified, Unqualified = unQualified), .id = "id") %>% 
    tidyr::separate(id, c("group", "list")) %>%
    mutate(elm = ifelse(pos.score >= 10, "valid", "invalid")) %>% 
    arrange(list, group, desc(elm))

res <- singleDF %>% split(list(.$list, .$elm, .$group))

这是我想要的情节:

enter image description here

请注意,在validinvalid类别中,我需要重复删除data.frame,而QualifiedUnQualified类别,我会重复这些行。

我如何实现我想要的情节?如何使用ggplot2包来实现这一目标?有什么好主意吗?在此先感谢:)

2 个答案:

答案 0 :(得分:3)

也许是这样的事情?:

library(tidyverse)
library(cowplot)
theme_set(theme_grey())

p1 <- ggplot(filter(singleDF, list == "aksu"), 
             aes(group, fill = elm)) +
  geom_bar() +
  ylim(0, 16) +
  theme(legend.position = 'top', legend.title = element_blank(), axis.title.x = element_blank())

p2 <- ggplot(filter(singleDF, list == "aksu") %>% distinct(), 
             aes(elm, fill = group)) +
  geom_bar() +
  scale_fill_discrete(h.start = 90) +
  ylim(0, 16) +
  theme(legend.position = 'top', legend.title = element_blank(), axis.title.x = element_blank())

plot_grid(p1, p2, align = 'v', nrow = 1)

enter image description here

答案 1 :(得分:2)

如果要对列表的每个元素执行此操作,可以使用java -cp temp.jar test.Main >> program.log 包并将@ Axeman的答案包装到函数中。我修改了@ Axeman的代码以获得你想要的外观,虽然我没有使用tidyverse所以我替换了cowplot

编辑:轻松修复以获得所需的情节,只需gridExtra grid.arrange一行的结果。我还调整了图表以更好地与您想要的输出对齐。我使用map来获取计数,geom_label并使用stat="count"特殊变量。如果您愿意,可以将其切换为..count..

geom_text

enter image description here