按因子分组和总和

时间:2014-08-17 14:23:30

标签: r ggplot2 sum aggregate

我有这样的数据:

> head(df)
                  Date IsWin
20 2014-07-13 00:00:00  True
21 2014-08-01 00:00:00  True
22 2014-08-05 00:00:00 False
23 2014-06-28 00:00:00  True
24 2014-05-31 00:00:00  True
25 2014-06-06 00:00:00  True

我想通过IsWin按日期和总和进行分组(应该是1或-1的因子)。

我已经读过这篇文章,但它并没有真正处理因素,因此我不知道如何应用它How to group a data.frame by date?

最后,我想将分组和汇总的数据传递给条形图,以显示获胜或亏损的数量,例如ggplot2 and a Stacked Bar Chart with Negative Values

以下输出一张表,非常有助于查看我想要的内容;但是,我想将其翻译成条形图以获得更好的视觉效果:

> table(df[,1],df[,2])

                      False True
  2014-05-25 00:00:00     1    0
  2014-05-29 00:00:00     1    0
  2014-05-30 00:00:00     2    0
  2014-05-31 00:00:00     0    1
  2014-06-06 00:00:00     0    1
  2014-06-13 00:00:00     1    0
  2014-06-14 00:00:00     0    1
  2014-06-18 00:00:00     1    0
  2014-06-19 00:00:00     0    1
  2014-06-23 00:00:00     1    0
  2014-06-24 00:00:00     1    0
  2014-06-25 00:00:00     1    0
  2014-06-27 00:00:00     0    1
  2014-06-28 00:00:00     1    2
  2014-07-02 00:00:00     1    0
  2014-07-11 00:00:00     1    0
  2014-07-13 00:00:00     0    2
  2014-07-31 00:00:00     0    1
  2014-08-01 00:00:00     0    1
  2014-08-05 00:00:00     1    0
  2014-08-07 00:00:00     1    0
  2014-08-12 00:00:00     0    1

这是我的实际结构:

df <- structure(list(Date = c("2014-07-13 00:00:00", "2014-08-01 00:00:00", 
"2014-08-05 00:00:00", "2014-06-28 00:00:00", "2014-05-31 00:00:00", 
"2014-06-06 00:00:00", "2014-06-14 00:00:00", "2014-05-25 00:00:00", 
"2014-06-24 00:00:00", "2014-06-28 00:00:00", "2014-05-30 00:00:00", 
"2014-06-18 00:00:00", "2014-07-02 00:00:00", "2014-07-11 00:00:00", 
"2014-05-29 00:00:00", "2014-06-19 00:00:00", "2014-07-31 00:00:00", 
"2014-06-27 00:00:00", "2014-06-23 00:00:00", "2014-05-30 00:00:00", 
"2014-07-13 00:00:00", "2014-08-12 00:00:00", "2014-06-13 00:00:00", 
"2014-06-25 00:00:00", "2014-06-28 00:00:00", "2014-08-07 00:00:00"
), IsWin = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L
), .Label = c("False", "True"), class = "factor")), .Names = c("Date", 
"IsWin"), row.names = 20:45, class = "data.frame")

2 个答案:

答案 0 :(得分:1)

尝试:

ddf2 = data.frame(with(df, table(Date, IsWin)))

ggplot(ddf2)+
    geom_bar(aes(x=Date, y=Freq, fill=IsWin), stat='identity', position='dodge')+
    theme(axis.text.x=element_text(angle=45, size=10, hjust=1, vjust=1))

enter image description here

编辑: 对于负面栏:

ddf2$new = ifelse(ddf2$IsWin=='True', 1,-1)

ggplot(ddf2)+
    geom_bar(data=ddf2[ddf2$new>0,], aes(x=Date, y=Freq*new, fill=IsWin), stat='identity')+
    geom_bar(data=ddf2[ddf2$new<0,], aes(x=Date, y=Freq*new, fill=IsWin), stat='identity')+
    theme(axis.text.x=element_text(angle=45, size=10, hjust=1, vjust=1))

enter image description here

答案 1 :(得分:1)

这个怎么样?您在包中使用group_by()。您可以按以下方式对数据进行分组。您可以汇总(计算)每个日期存在多少TRUE和FALSE。使用此数据框,您可以创建堆积条形图。

library(dplyr)
library(ggplot2)

### Create a sample data set
dates <- rep(c("2014-08-01", "2014-08-02"), each = 10, times = 1)
win <- rep(c("TRUE", "FALSE", "FALSE", "TRUE", "TRUE"), each = 1, times = 4)

foo <- data.frame(cbind(dates, win))
foo$dates <- as.character(foo$dates)

ana <- foo %>%
         group_by(dates, win) %>%
         summarize(count = n())

# ana
# Source: local data frame [4 x 3]
# Groups: date

#        dates   win count
# 1 2014-08-01 FALSE     4
# 2 2014-08-01  TRUE     6
# 3 2014-08-02 FALSE     4
# 4 2014-08-02  TRUE     6

bob <- ggplot(ana, aes(x=dates, y=count, fill=win)) +
         geom_bar(stat="identity") +
         scale_y_continuous(breaks = seq(0,10,by = 1))

更新选项

看到评论后,我提出了这个想法。它有两个新的东西。一种是当胜利条件为假时将正值转换为负值。另一个是新的ggplot。我相信有更好的办法。但是,我想在这里提出这个想法。

ana <- foo %>%
    group_by(dates, win) %>%
    summarize(count = n())

# If there is FALSE in ith row in the win column, make the value of ith row in the
# count column negative. If you can avoid a loop and achieve the same goal, that
# may be the best option. But, I do not have any ideas in my mind yet.

for(i in 1:nrow(ana)){

    if(ana$win[[i]] == "FALSE"){

    ana$count[[i]] <- -abs(ana$count[[i]])

    }
}

bob <- ggplot(data=ana, aes(x=dates, y=count, fill=win)) +
       geom_bar(stat="identity", position=position_dodge())

这是否符合您的要求?