从数据框分组数据

时间:2015-01-27 17:12:02

标签: r ggplot2 dataframe

我的数据格式如下:

first second data_col1 data_col2 data_col3
lu    NA     <number>  <number>  <number>
lu    NA     <number>  <number>  <number>
lu    NA     <number>  <number>  <number>
lu    NA     <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
lu    mult   <number>  <number>  <number>
mult  NA     <number>  <number>  <number>
mult  NA     <number>  <number>  <number>
mult  NA     <number>  <number>  <number>
mult  NA     <number>  <number>  <number>

等等。

我想通过前两列对这些数据进行分组并分别绘制它们。

我尝试这样做:

comb <- unique(total.df[c(1,2)])
apply(comb, 1, function(x) {
  d<-total.df[total.df$guess==FALSE &
              total.df$second==x[2] &
              total.df$first==x[1] &
              total.df$tasks=='tasks_const',]
  p = ggplot(d, aes(x=d$platform, y=d$time,
                    group=as.factor(d$sched),
             colour=as.factor(d$sched))) +
      geom_point() + geom_line()
  ggsave(filename=sprintf("/tmp/a_%s_%s.png", x[1], x[2]))
})

我的梳子看起来如下:

        first   second
1        mult     <NA>
121        lu     mult
241        lu     <NA>
361      heat     mult
481      heat       lu
601      heat     <NA>
721  cholesky     mult
841  cholesky       lu
961  cholesky     heat
1081 cholesky     <NA>
1201 pipeline     mult
1321 pipeline       lu
1441 pipeline     heat
1561 pipeline cholesky
1681 pipeline     <NA>
1801      gen     mult
1921      gen       lu
2041      gen     heat
2161      gen cholesky
2281      gen pipeline
2401      gen     <NA>

facet_wrap几乎解决了我的任务,但我希望每张图片都能分开才能看到实际存在的内容。并且使用facet_wrap,每个都太小了。

使用facet_wrap的代码如下:

ggplot(total.df, aes(x=total.df$platform, y=total.df$time,
       group=as.factor(total.df$sched),
       colour=as.factor(total.df$sched))) +
geom_point() + geom_line() + facet_wrap(first ~ second);

1 个答案:

答案 0 :(得分:2)

我建议将每个图表绘制在一个pdf文件的不同页面上。我还建议使用data.table,因为它会使事情变得更好:

library(data.table)
total.dt <- data.table(total.df)
setkey(total.dt, first, second)
comb <- unique(total.dt[, list(first, second)])

pdf("test.pdf")
for(n in 1:nrow(comb)){
  d <- total.dt[comb[n, ]][guess == FALSE & tasks == "tasks_const"] 
  print(ggplot(d, aes(x = platform, y = time,
                      group = as.factor(sched),
                      colour = as.factor(sched))) +
        geom_point() + geom_line() + 
        ggtitle(sprintf("first=%s, second=%s", comb[n, first], comb[n, second])))}
dev.off()