在函数内选择正确的值范围

时间:2011-01-10 06:42:57

标签: function r subset

我正在尝试使用drawdown包中的tseries函数创建自定义函数。我想将此函数应用于函数中正确的值范围,但即使这是一个非常新手的问题,我也看不到可能的解决方案。

这是我的数据框架的样子:

> subSetTrades
   Instrument  EntryTime   ExitTime AccountValue
1         JPM 2007-03-01 2007-04-10         6997
2         JPM 2007-04-10 2007-05-29         7261
3         JPM 2007-05-29 2007-07-18         7545
4         JPM 2007-07-18 2007-07-19         7614
5         JPM 2007-07-19 2007-08-22         7897
6         JPM 2007-08-22 2007-08-28         7678
7         JPM 2007-08-28 2007-09-17         7587
8         JPM 2007-09-17 2007-10-17         7752
9         JPM 2007-10-17 2007-10-29         7717
10        JPM 2007-10-29 2007-11-02         7423
11        KFT 2007-04-13 2007-05-14         6992
12        KFT 2007-05-14 2007-05-21         6944
13        KFT 2007-05-21 2007-07-09         7069
14        KFT 2007-07-09 2007-07-16         6919
15        KFT 2007-07-16 2007-07-27         6713
16        KFT 2007-07-27 2007-09-07         6820
17        KFT 2007-09-07 2007-10-12         6927
18        KFT 2007-10-12 2007-11-28         6983
19        KFT 2007-11-28 2007-12-18         6957
20        KFT 2007-12-18 2008-02-20         7146

如果我手动计算我希望输出函数的值,结果是正确的:

# Apply the function to the dataframe
with(subSetTrades, tapply(AccountValue, Instrument, MDD_Duration))
JPM KFT 
106  85 
> # Check the function for JPM
> maxdrawdown(subSetTrades[1:10,4])$from
[1] 5
> maxdrawdown(subSetTrades[1:10,4])$to
[1] 10
> # Get the entry time for JPM on row 5
> # Get the exit time for JPM on row 10
> # Calculate the time difference
> difftime(subSetTrades[10,3], subSetTrades[5,2], units='days')
Time difference of 106 days
# Check the calculations for the other Instrument
> maxdrawdown(subSetTrades[11:20,4])$from
[1] 3
> maxdrawdown(subSetTrades[11:20,4])$to
[1] 5
> # Get the exittime on row 5 for KFT, get the entrytime for KFT on row 3, 
# and calculate the time difference
> difftime(subSetTrades[15,3], subSetTrades[13,2])
Time difference of 67 days

正如您在上面的示例中所看到的,我的自定义函数(MDD_Duration)为JPM提供了正确的值,但为KFT提供了错误的值:而不是85,结果应该是67.函数MDD_Duration是以下内容:

MDD_Duration <- function(x){
    require(tseries)
    # Get starting point
    mdd_Start <- maxdrawdown(x)$from
    mdd_StartDate <- subSetTrades$EntryTime[mdd_Start]
    # Get the endpoint
    mdd_End <- maxdrawdown(x)$to
    mdd_EndDate <- subSetTrades$ExitTime[mdd_End]
    return(difftime(mdd_EndDate, mdd_StartDate, units='days'))
}

手动回溯此自定义函数的步骤显示使用“from”和“to”行数进行计算时出现问题(即R需要调整KFT的值在它之前的仪器长度,在这种情况下是JPM)。对于可能的解决方案,R需要执行以下操作:

如果此工具是第一个(即在列表顶部),则获取maxdrawdown函数的'from'值。但是,如果当前乐器是第二个(或第三个等),则考虑前一个乐器的长度。因此,如果仪器JPM的长度为10,则搜索KFT值应该从+10开始。搜索乐器3的fromto值应从乐器1的长度+乐器2的长度开始。

我尝试在函数中使用nrow(这似乎是这个答案的明显解决方案),这导致了关于'长度为0的参数'的错误,即使nrow被正确使用(即在...之外的相同语句)功能确实有效)。我还尝试将函数内部的数据进行子集化,这也没有用。任何想法都非常受欢迎。 :)

1 个答案:

答案 0 :(得分:2)

split是你的朋友。如果我修改你的函数,以便它需要一个具有三个感兴趣的变量(AccountValue,EntryTime,ExitTime)的数据框,如下所示:

MDD_Duration <- function(x){
    # require(tseries)
    # Get starting point
    mdd_Start <- maxdrawdown(x$AccountValue)$from
    mdd_StartDate <- x$EntryTime[mdd_Start]
    # Get the endpoint
    mdd_End <- maxdrawdown(x$AccountValue)$to
    mdd_EndDate <- x$ExitTime[mdd_End]
    return(difftime(mdd_EndDate, mdd_StartDate, units='days'))
}

我们可以将它应用于您的数据的拆分版本:

> sapply(split(subSetTrades[,-1], subSetTrades[,1]), MDD_Duration)
JPM KFT 
106  67

查看split对您的数据做了什么可能会有所帮助:

> split(subSetTrades[,-1], subSetTrades[,1])
$JPM
    EntryTime   ExitTime AccountValue
1  2007-03-01 2007-04-10         6997
2  2007-04-10 2007-05-29         7261
3  2007-05-29 2007-07-18         7545
4  2007-07-18 2007-07-19         7614
5  2007-07-19 2007-08-22         7897
6  2007-08-22 2007-08-28         7678
7  2007-08-28 2007-09-17         7587
8  2007-09-17 2007-10-17         7752
9  2007-10-17 2007-10-29         7717
10 2007-10-29 2007-11-02         7423

$KFT
    EntryTime   ExitTime AccountValue
11 2007-04-13 2007-05-14         6992
12 2007-05-14 2007-05-21         6944
13 2007-05-21 2007-07-09         7069
14 2007-07-09 2007-07-16         6919
15 2007-07-16 2007-07-27         6713
16 2007-07-27 2007-09-07         6820
17 2007-09-07 2007-10-12         6927
18 2007-10-12 2007-11-28         6983
19 2007-11-28 2007-12-18         6957
20 2007-12-18 2008-02-20         7146

因此,只要您有一个接受并使用数据框/数据集子集的函数,我们就可以使用split来形成子集lapply或{{1}将我们的函数应用于这些子集。

您可能希望将此功能合并到您的函数sapply中:

MDD_Duration()

我们在MDD_Duration2 <- function(x){ FUN <- function(x) { # Get starting point mdd_Start <- maxdrawdown(x$AccountValue)$from mdd_StartDate <- x$EntryTime[mdd_Start] # Get the endpoint mdd_End <- maxdrawdown(x$AccountValue)$to mdd_EndDate <- x$ExitTime[mdd_End] return(difftime(mdd_EndDate, mdd_StartDate, units='days')) } sapply(split(x, droplevels(x[, "Instrument"])), FUN) } 上使用新的(在R 2.12.x中)函数droplevels以允许函数工作,即使我们有单一级别的数据或对数据的子集进行操作:

x[, "Instrument"])