按因子计算的另一个因素

时间:2014-03-05 00:31:40

标签: r dataframe

我正在使用股票信息的数据框,这是它的样子:

    > str(test)
'data.frame':   211717 obs. of  19 variables:
 $ Symbol        : Factor w/ 3378 levels "AACC","AACE",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ MktCategory   : Factor w/ 3 levels "","NNM","SCM": 2 2 2 2 2 2 2 2 2 2 ...
 $ TSO           : num  37205115 37205115 37205115 37205115 37205115 ...
 $ TSO_Date      : Factor w/ 200 levels "","1/1/2006",..: 137 137 137 137 137 137 137 137 137 137 ...
 $ X.OfMP        : int  56 56 56 56 56 56 56 56 56 56 ...
 $ MPID          : Factor w/ 670 levels "","ABLE","ABNA",..: 608 459 533 618 550 635 307 146 387 482 ...
 $ MP_type       : Factor w/ 4 levels "","C","M","NR": 2 3 4 3 3 3 3 4 3 4 ...
 $ Total_Vol     : int  32900 0 2949 758522 41316 706131 29300 16898 362569 1490 ...
 $ Total_Rank    : int  18 0 35 2 17 3 21 26 5 40 ...
 $ Total_Pct     : int  0 0 0 14 0 13 0 0 7 0 ...
 $ Block_Vol     : int  0 0 0 60800 20000 34900 19200 16600 0 0 ...
 $ Block_Rank    : int  0 0 0 2 6 4 7 9 0 0 ...
 $ Block_Pct     : int  0 0 0 15 5 9 5 4 0 0 ...
 $ YTD_Total_Vol : num  81615 2929 10684 1949230 190874 ...
 $ YTD_Total_Rank: int  28 59 44 3 17 5 30 27 12 67 ...
 $ YTD_Total_Pct : int  0 0 0 9 0 7 0 0 2 0 ...
 $ YTD_Block_Vol : int  0 0 0 197420 80000 390600 60900 73787 55994 0 ...
 $ YTD_Block_Rank: int  0 0 0 5 13 3 16 14 17 0 ...
 $ YTD_Block_Pct : int  0 0 0 6 3 12 2 2 2 0 ...

所以我知道如何用符号将总体积(Total_Vol)与聚合函数相加:

volbystock<-aggregate(test$Total_Vol,by=list(test$Symbol),FUN=sum)

但我试图分析只有少数MPID值的音量。我想只在MPID是另一个列表中的MPID之一时添加符号的Total_Vol。换句话说,如果相应的MPID是以下之一,我只想添加某个符号的Total_Vol:

> use_MPID<-c("GSCO","LATS","TACT","INCA","LATS","LQNT","ITGI")

2 个答案:

答案 0 :(得分:1)

使用dply可以执行以下操作:

# load dplyr    
library(dplyr)

# create a vector of MPIDs you are interested on
use_MPID <- c("GSCO","LATS","TACT","INCA","LATS","LQNT","ITGI")

# create a fake dataset just for representation
test <- data.frame(cbind(c("ci", "di", "bi", "bi"), c("GSCO","LATS","TACT","INCA"), c(35, 110, 201, 435)))
names(test) <- c("Symbol", "MPID", "TotalVol")

# use dplyr to summarise your dataset
volbystock <- test %.%
   group_by(Symbol) %.%
   select(Symbol, MPID, TotalVol) %.%
   filter(MPID %in% use_MPID) 

答案 1 :(得分:0)

看起来您可以使用以下方法对data.frame进行子集化。

use_MPID <- c("GSCO","LATS","TACT","INCA","LATS","LQNT","ITGI")
relevant.symbols <- which(test$MPID %in% use_MPID)
volbystock <- aggregate(test$Total_Vol[relevant.symbols],
    by=list(test$Symbol[relevant.symbols]),
    FUN=sum)

这会解决您的问题吗?

修改

更好的是,您可以使用子集可选参数,同时提供正确的公式:

use_MPID <- c("GSCO","LATS","TACT","INCA","LATS","LQNT","ITGI")
volbystock <- aggregate(formula=test$Total_Vol ~ test$Symbol,
    subset=(test$MPID %in% use_MPID),
    FUN=sum)