将每个项目合并为2行,以进行特定于项目的第1行和第2行的计算

时间:2017-12-22 06:55:28

标签: r dplyr

我设法操纵我的数据,将当前项目的多个买入/卖出减少为2行,一个用于买入,一个用于卖出。所以,我的数据框目前看起来像这样:

Market   Type `After Fees Collapsed` `AFC Signed`
 <chr>  <chr>                  <dbl>        <dbl>
ITEM_1    BUY             0.03220841  -0.03220841
ITEM_1   SELL             0.03251323   0.03251323
ITEM_2    BUY             0.05522072  -0.05522072
ITEM_2   SELL             0.01160392   0.01160392
ITEM_3    BUY             0.05707432  -0.05707432
ITEM_3   SELL             0.05759784   0.05759784
ITEM_4    BUY             0.03221925  -0.03221925
ITEM_4   SELL             0.03217333   0.03217333
ITEM_5    BUY             0.05070265  -0.05070265
ITEM_5   SELL             0.05118556   0.05118556

AFC Signed列只是“After Fees Collapsed”列的乘以-1为购买。这样我就可以通过运行下面的代码来计算净结果。但是,我相信,通过对这个问题的正确答案,我可以取消AFC签名专栏并直接使用我的买入和卖出行​​中的值。

... %>%
    summarise(Nett = sum(`AFC Signed`)

我想做的是将ITEM_#折叠成一行,其中一列为“Nett”(来自特定ITEM的卖出),一列为“%Nett”(来自特定ITEM的) (SELL-买入)/买入)。因此输出将如下所示(数字组成):

Market    Nett  `% Nett`
ITEM_1    0.03      10%
ITEM_2    -0.4     -15%
ITEM_3     1.5    7.33%
ITEM_4   0.003    2.45%
ITEM_%  -1.468  -4.141%

3 个答案:

答案 0 :(得分:2)

这种事情的基本思想一般是(d是你的数据集)

aggregate(`AFC Signed` ~ Market, d, sum)

但实际上你最好修复你的数据模型。从一开始,您的数据框应该是:

item  buy_value  sell_value
...   ...        ...

而不是

item  type  value
id1   buy   ...
id1   sell  ...

然后你就可以做到

d$net_pct = (d$sell_value - d$buy_value) / d$buy_value

编辑完整性,如何修复数据框(这很简单):

d = d[order(d$Market),]
d2 = d[d$Type == 'BUY',]
d3 = d[d$Type == 'SELL',]
all(d2$Market == d3$Market) # should be true
d2$`Sell after Fees Collapsed` = d3$`After Fees Collapsed`
d2$net = d2$`Sell after Fees Collapsed` - d2$`After Fees Collapsed`
d2$net_pct = d2$net / d2$`After Fees Collapsed`

答案 1 :(得分:2)

  aggregate(.~Market,dat[1:3],function(x)c(a<-diff(x),a/x[1]))
   Market Type.M Type.V2 X.AfterFeesCollapsed..M X.AfterFeesCollapsed..V2
 1 ITEM_1      1       1             0.000304820              0.009463988
 2 ITEM_2      1       1            -0.043616800             -0.789862936
 3 ITEM_3      1       1             0.000523520              0.009172602
 4 ITEM_4      1       1            -0.000045920             -0.001425235
 5 ITEM_5      1       1             0.000482910              0.009524354

答案 2 :(得分:1)

使用dplyr的方法(您显然已经使用过):

d %>% 
  group_by(Market) %>% 
  summarise(Nett = After_Fees_Collapsed[Type == 'SELL'] - After_Fees_Collapsed[Type == 'BUY'],
            pNett = 100 * Nett / After_Fees_Collapsed[Type == 'BUY'])

给出:

# A tibble: 5 x 3
  Market        Nett       pNett
  <fctr>       <dbl>       <dbl>
1 ITEM_1  0.00030482   0.9463988
2 ITEM_2 -0.04361680 -78.9862936
3 ITEM_3  0.00052352   0.9172602
4 ITEM_4 -0.00004592  -0.1425235
5 ITEM_5  0.00048291   0.9524354

使用过的数据:

d <- structure(list(Market = c("ITEM_1", "ITEM_1", "ITEM_2", "ITEM_2", "ITEM_3", "ITEM_3", "ITEM_4", "ITEM_4", "ITEM_5", "ITEM_5"), 
                    Type = c("BUY", "SELL", "BUY", "SELL", "BUY", "SELL", "BUY", "SELL", "BUY", "SELL"),
                    After_Fees_Collapsed = c(0.03220841, 0.03251323, 0.05522072, 0.01160392, 0.05707432, 0.05759784, 0.03221925, 0.03217333, 0.05070265, 0.05118556),
                    AFC_Signed = c(-0.03220841, 0.03251323, -0.05522072, 0.01160392, -0.05707432, 0.05759784, -0.03221925, 0.03217333, -0.05070265, 0.05118556)),
               .Names = c("Market", "Type", "After_Fees_Collapsed", "AFC_Signed"), class = "data.frame", row.names = c(NA, -10L))