按2列分组数据并在R中进行计算

时间:2014-07-23 10:32:00

标签: r

假设我有一个data.frame

> ID=c("S","B","S","S","B","S","S","S","B","B","S")     #"S" means Sell, "B" means Buy
> quant=c(3,2,5,1,2,10,4,4,17,6,6)
> time=c(1,2,3,3,4,4,5,5,5,6,6)
> test=data.frame(ID,quant,time)
    ID quant time
  1  S     3    1
  2  B     2    2
  3  S     5    3
  4  S     1    3
  5  B     2    4
  6  S    10    4
  7  S     4    5
  8  S     4    5
  9  B    17    5
  9  B    6     6
  9  S    6     6

我需要计算每次的净位置,以便我的最终输出为

    ID quant time
  1  S     3    1
  2  B     2    2
  3  S     6    3    #Sell 5+1=6
  4  S     8    4    #Sell 10-2=8
  5  B     9    5    #Buy 17-4-4=9
  6  B     0    6    #this row is optional because the position is 0 so I don't need it

我认为我需要先按时间对数据框进行分组,然后按ID进行分组,但是我如何才能执行净位置的计算?我试过使用aggregate,但它似乎只适用于按一列分组?谢谢!

3 个答案:

答案 0 :(得分:3)

另一个想法,可能看起来有点脆弱,但似乎对特定情况有效:

tmp = diff(xtabs(quant ~ ID + time, test))
data.frame(ID = c(ifelse(tmp > 0, "S", "B")), 
           time = colnames(tmp), 
           quant = c(abs(tmp)))
#  ID time quant
#1  S    1     3
#2  B    2     2
#3  S    3     6
#4  S    4     8
#5  B    5     9
#6  B    6     0

基础R中的经典方法:

do.call(rbind, 
        lapply(split(test, test$time), 
               function(x) { 
                  s = sum(x[["quant"]][x[["ID"]] == "S"])
                  b = sum(x[["quant"]][x[["ID"]] == "B"])
                  data.frame(time = x$time[1], 
                             quant = abs(s - b),
                             ID = if(s > b) "S" else "B")
               }))
#  time quant ID
#1    1     3  S
#2    2     2  B
#3    3     6  S
#4    4     8  S
#5    5     9  B
#6    6     0  B

答案 1 :(得分:2)

使用data.table

library(data.table)
setDT(test)[, list(quant = sum(quant[ID == "B"]) - sum(quant[ID == "S"])),
            by = time][, list(ID = ifelse(quant > 0, "B", "S"), quant = abs(quant), time)]

#    ID quant time
# 1:  S     3    1
# 2:  B     2    2
# 3:  S     6    3
# 4:  S     8    4
# 5:  B     9    5
# 6:  S     0    6

答案 2 :(得分:2)

这是dplyr的一个解决方案。

library(dplyr)
result <- test %>% 
      group_by(time) %>%
      summarise(quant = sum(quant[ID == "B"]) - sum(quant[ID == "S"])) %>% 
      mutate(ID = c("S", "B")[(quant >= 0) + 1], quant = abs(quant))

#   time quant ID
# 1    1     3  S
# 2    2     2  B
# 3    3     6  S
# 4    4     8  S
# 5    5     9  B
# 6    6     0  B

c("S", "B")[(quant >= 0) + 1]是什么意思?

  • 命令(quant >= 0)创建一个逻辑向量,指示quant是否等于或大于0.
  • 在下一步中,将1添加到此向量中。如果数学运算符与逻辑向量一起使用,FALSE将转换为0TRUE将转换为1。因此,此命令会生成1s和2s的向量。
  • 此向量用作c("S", "B")的索引向量,因此会生成"S" s和"B" s的字符向量。