假设我有一个data.frame
> ID=c("S","B","S","S","B","S","S","S","B","B","S") #"S" means Sell, "B" means Buy
> quant=c(3,2,5,1,2,10,4,4,17,6,6)
> time=c(1,2,3,3,4,4,5,5,5,6,6)
> test=data.frame(ID,quant,time)
ID quant time
1 S 3 1
2 B 2 2
3 S 5 3
4 S 1 3
5 B 2 4
6 S 10 4
7 S 4 5
8 S 4 5
9 B 17 5
9 B 6 6
9 S 6 6
我需要计算每次的净位置,以便我的最终输出为
ID quant time
1 S 3 1
2 B 2 2
3 S 6 3 #Sell 5+1=6
4 S 8 4 #Sell 10-2=8
5 B 9 5 #Buy 17-4-4=9
6 B 0 6 #this row is optional because the position is 0 so I don't need it
我认为我需要先按时间对数据框进行分组,然后按ID进行分组,但是我如何才能执行净位置的计算?我试过使用aggregate
,但它似乎只适用于按一列分组?谢谢!
答案 0 :(得分:3)
另一个想法,可能看起来有点脆弱,但似乎对特定情况有效:
tmp = diff(xtabs(quant ~ ID + time, test))
data.frame(ID = c(ifelse(tmp > 0, "S", "B")),
time = colnames(tmp),
quant = c(abs(tmp)))
# ID time quant
#1 S 1 3
#2 B 2 2
#3 S 3 6
#4 S 4 8
#5 B 5 9
#6 B 6 0
基础R中的经典方法:
do.call(rbind,
lapply(split(test, test$time),
function(x) {
s = sum(x[["quant"]][x[["ID"]] == "S"])
b = sum(x[["quant"]][x[["ID"]] == "B"])
data.frame(time = x$time[1],
quant = abs(s - b),
ID = if(s > b) "S" else "B")
}))
# time quant ID
#1 1 3 S
#2 2 2 B
#3 3 6 S
#4 4 8 S
#5 5 9 B
#6 6 0 B
答案 1 :(得分:2)
使用data.table
library(data.table)
setDT(test)[, list(quant = sum(quant[ID == "B"]) - sum(quant[ID == "S"])),
by = time][, list(ID = ifelse(quant > 0, "B", "S"), quant = abs(quant), time)]
# ID quant time
# 1: S 3 1
# 2: B 2 2
# 3: S 6 3
# 4: S 8 4
# 5: B 9 5
# 6: S 0 6
答案 2 :(得分:2)
这是dplyr
的一个解决方案。
library(dplyr)
result <- test %>%
group_by(time) %>%
summarise(quant = sum(quant[ID == "B"]) - sum(quant[ID == "S"])) %>%
mutate(ID = c("S", "B")[(quant >= 0) + 1], quant = abs(quant))
# time quant ID
# 1 1 3 S
# 2 2 2 B
# 3 3 6 S
# 4 4 8 S
# 5 5 9 B
# 6 6 0 B
c("S", "B")[(quant >= 0) + 1]
是什么意思?
(quant >= 0)
创建一个逻辑向量,指示quant
是否等于或大于0. FALSE
将转换为0
,TRUE
将转换为1
。因此,此命令会生成1s和2s的向量。c("S", "B")
的索引向量,因此会生成"S"
s和"B"
s的字符向量。