我有一个数据框,我想通过删除一些抵消线(盒装位置)和做一些净额来清理。
这是源表:
Type Name Strike Maturity Nominal
Call Amazon 10 10/12/2018 1000
Put Amazon 10 10/12/2018 1000
Call Ebay 8 2/8/2018 800
Put Ebay 8 2/8/2018 500
Call Facebook 5 5/5/2018 900
Call Google 2 23/4/2018 250
Put Google 2 23/4/2018 350
Call Microsoft 2 19/3/2018 250
Put Microsoft 2.5 19/3/2018 350
Put Ebay 8 2/8/2018 100
代码的结果在这里:
Type Name Strike Maturity Nominal
Call Ebay 8 2/8/2018 200
Call Facebook 5 5/5/2018 900
Put Google 2 23/4/2018 100
Call Microsoft 2 19/3/2018 250
Put Microsoft 2.5 19/3/2018 350
我试图在R中编写一个执行这3项任务的代码:
1 //删除所有相互抵消的对。 相互抵消的一对是符合这两个标准的一对:
示例:2"亚马逊"从表中删除的行
2 //对不完全相互抵消的线路进行标称净值。 一对不完全相互抵消的对是符合以下两个标准的一对:
示例:2" Ebay"线路已在通话或2" Google"在Put上被净获的线。
3 //不要在所有其他行上做任何事情
示例:2" Microsoft"线。他们有不同的罢工,所以根本不应该进行净额结算
请参阅下面我的第一次尝试。 我的想法是先创建一个带有唯一键的新列,然后按字母顺序排序,然后逐个测试每一行。 我发现它非常费力,所以我想知道是否有人可以帮我找到更简单有效的解决方案? 非常感谢!
library(data.table)
dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
Strike=c(10,10,8,8,5,2,2,2,2.5,8),
Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018),
Nominal=c(1000,1000,800,500,900,250,350,250,35,100))
##idea
dt$key <- paste(dt$Name,dt$Strike,dt$Maturity)
dt[order(dt$key,decreasing = FALSE),]
dt$Type2 <- ifelse(dt$Type = "Call",1,0)
#for each line k, test value in the column "Key" and the column "Type2":
#if key(k) = key(k+1) and Type2(k)+Type2(k+1)=1 then
#if Nominal (k)> Nominal (k+1), delete the line k+1 and do the netting on nominal of the line k
#else Nomnial (k+1)< Nominal (k), delete the line k and do the netting on nominal of the line k+1
#next k
dt <- dt[dt$Nominal!=0,]
dt$key <- NULL
在推荐的想法之后,我尝试了dcast解决方案,但看起来它没有做正确的网络,如下所示:
> dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
+ Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
+ Strike=c(10,10,8,8,5,2,2,2,2.5,8),
+ Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018"),
+ Nominal=c(1000,1000,800,500,900,250,350,250,350,100))
> dcast(dt, Name + Maturity + Strike ~ Type, value.var="Nominal", fill = 0)[, Net := Call - Put][Net != 0]
Aggregate function missing, defaulting to 'length'
Name Maturity Strike Call Put Net
1: Ebay 2/8/2018 8.0 1 2 -1
2: Facebook 5/5/2018 5.0 1 0 1
3: Microsoft 19/3/2018 2.0 1 0 1
4: Microsoft 19/3/2018 2.5 0 1 -1
答案 0 :(得分:0)
这是一个tidyverse
解决方案。基本上,由于您要对具有相同Name
,Strike
和Maturity
的所有行进行分组,因此我认为将Call
和Put
转换为实际最简单数字并使用summarise
。您的特殊偏移情况实际上只是删除了总数最终为0的网络案例。
方法是:
Put
和Nominal
将ifelse
转换为mutate
的负值,group_by
和summarise
将群组缩减为每个群组的单个值,filter
,Type
列并将负值设为正值。代码:
library(tidyverse)
tbl <- read_table2(
"Type Name Strike Maturity Nominal
Call Amazon 10 10/12/2018 1000
Put Amazon 10 10/12/2018 1000
Call Ebay 8 2/8/2018 800
Put Ebay 8 2/8/2018 500
Call Facebook 5 5/5/2018 900
Call Google 2 23/4/2018 250
Put Google 2 23/4/2018 350
Call Microsoft 2 19/3/2018 250
Put Microsoft 2.5 19/3/2018 350
Put Ebay 8 2/8/2018 100"
)
tbl %>%
mutate(actual = ifelse(Type == "Call", Nominal, -Nominal)) %>%
group_by(Name, Strike, Maturity) %>%
summarise(Net = sum(actual)) %>%
filter(Net != 0) %>%
mutate(
Type = ifelse(Net > 0, "Call", "Put"),
Net = abs(Net)
)
# A tibble: 5 x 5
# Groups: Name, Strike [5]
Name Strike Maturity Net Type
<chr> <dbl> <chr> <int> <chr>
1 Ebay 8.00 2/8/2018 200 Call
2 Facebook 5.00 5/5/2018 900 Call
3 Google 2.00 23/4/2018 100 Put
4 Microsoft 2.00 19/3/2018 250 Call
5 Microsoft 2.50 19/3/2018 350 Put