在数据帧中净额结算

时间:2018-02-14 20:29:22

标签: r dataframe subtotal

我有一个数据框,我想通过删除一些抵消线(盒装位置)和做一些净额来清理。

这是源表:

    Type  Name     Strike  Maturity    Nominal
    Call  Amazon    10     10/12/2018  1000
    Put   Amazon    10     10/12/2018  1000
    Call  Ebay      8      2/8/2018    800
    Put   Ebay      8      2/8/2018    500
    Call  Facebook  5      5/5/2018    900
    Call  Google    2      23/4/2018   250
    Put   Google    2      23/4/2018   350
    Call  Microsoft 2      19/3/2018   250
    Put   Microsoft 2.5    19/3/2018   350
    Put   Ebay      8      2/8/2018    100

代码的结果在这里:

    Type  Name      Strike  Maturity   Nominal
    Call  Ebay      8       2/8/2018   200
    Call  Facebook  5       5/5/2018   900
    Put   Google    2       23/4/2018  100
    Call  Microsoft 2       19/3/2018  250
    Put   Microsoft 2.5     19/3/2018  350

我试图在R中编写一个执行这3项任务的代码:

1 //删除所有相互抵消的对。 相互抵消的一对是符合这两个标准的一对:

  • 具有相同名称,打击,成熟度和名义的2行。
  • 1行是" Call"而另一个是" Put"

示例:2"亚马逊"从表中删除的

2 //对不完全相互抵消的线路进行标称净值。 一对不完全相互抵消的对是符合以下两个标准的一对:

  • 具有相同名称,打击和成熟度但名义不同的2行
  • 1行是" Call"而另一个是" Put"

示例:2" Ebay"线路已在通话或2" Google"在Put上被净获的线。

3 //不要在所有其他行上做任何事情

示例:2" Microsoft"线。他们有不同的罢工,所以根本不应该进行净额结算

请参阅下面我的第一次尝试。 我的想法是先创建一个带有唯一键的新列,然后按字母顺序排序,然后逐个测试每一行。 我发现它非常费力,所以我想知道是否有人可以帮我找到更简单有效的解决方案? 非常感谢!

library(data.table)

dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
                 Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
                 Strike=c(10,10,8,8,5,2,2,2,2.5,8),
                 Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018),
                 Nominal=c(1000,1000,800,500,900,250,350,250,35,100))

##idea
dt$key <- paste(dt$Name,dt$Strike,dt$Maturity)
dt[order(dt$key,decreasing = FALSE),]
dt$Type2 <- ifelse(dt$Type = "Call",1,0)

#for each line k, test value in the column "Key" and the column "Type2":
#if key(k) = key(k+1) and Type2(k)+Type2(k+1)=1 then 
    #if Nominal (k)> Nominal (k+1), delete the line k+1 and do the netting on nominal of the line k
    #else Nomnial (k+1)< Nominal (k), delete the line k and do the netting on nominal of the line k+1
#next k

dt <- dt[dt$Nominal!=0,]
dt$key <- NULL

在推荐的想法之后,我尝试了dcast解决方案,但看起来它没有做正确的网络,如下所示:

> dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
+                  Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
+                  Strike=c(10,10,8,8,5,2,2,2,2.5,8),
+                  Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018"),
+                  Nominal=c(1000,1000,800,500,900,250,350,250,350,100))
> dcast(dt, Name + Maturity + Strike ~ Type, value.var="Nominal", fill = 0)[, Net := Call - Put][Net != 0]
Aggregate function missing, defaulting to 'length'
        Name  Maturity Strike Call Put Net
1:      Ebay  2/8/2018    8.0    1   2  -1
2:  Facebook  5/5/2018    5.0    1   0   1
3: Microsoft 19/3/2018    2.0    1   0   1
4: Microsoft 19/3/2018    2.5    0   1  -1

1 个答案:

答案 0 :(得分:0)

这是一个tidyverse解决方案。基本上,由于您要对具有相同NameStrikeMaturity的所有行进行分组,因此我认为将CallPut转换为实际最简单数字并使用summarise。您的特殊偏移情况实际上只是删除了总数最终为0的网络案例。

方法是:

  1. 使用PutNominalifelse转换为mutate的负值,
  2. 使用group_bysummarise将群组缩减为每个群组的单个值,
  3. 使用filter
  4. 删除完美抵消
  5. 替换Type列并将负值设为正值。
  6. 代码:

    library(tidyverse)
    tbl <- read_table2(
      "Type  Name     Strike  Maturity    Nominal
      Call  Amazon    10     10/12/2018  1000
      Put   Amazon    10     10/12/2018  1000
      Call  Ebay      8      2/8/2018    800
      Put   Ebay      8      2/8/2018    500
      Call  Facebook  5      5/5/2018    900
      Call  Google    2      23/4/2018   250
      Put   Google    2      23/4/2018   350
      Call  Microsoft 2      19/3/2018   250
      Put   Microsoft 2.5    19/3/2018   350
      Put   Ebay      8      2/8/2018    100"
    )
    
    tbl %>%
      mutate(actual = ifelse(Type == "Call", Nominal, -Nominal)) %>%
      group_by(Name, Strike, Maturity) %>%
      summarise(Net = sum(actual)) %>%
      filter(Net != 0) %>%
      mutate(
        Type = ifelse(Net > 0, "Call", "Put"),
        Net = abs(Net)
        )
    # A tibble: 5 x 5
    # Groups:   Name, Strike [5]
      Name      Strike Maturity    Net Type 
      <chr>      <dbl> <chr>     <int> <chr>
    1 Ebay        8.00 2/8/2018    200 Call 
    2 Facebook    5.00 5/5/2018    900 Call 
    3 Google      2.00 23/4/2018   100 Put  
    4 Microsoft   2.00 19/3/2018   250 Call 
    5 Microsoft   2.50 19/3/2018   350 Put