Question

from datetime import datetime
from pandas.io.data import DataReader
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn

def hurst(ts):

    """Returns the Hurst Exponent of the time series vector ts"""
    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    # Here it calculates the variances, but why it uses 
    # standard deviation and then make a root of it?
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0


# Download the stock prices series from Yahoo
aapl = DataReader("AAPL", "yahoo", datetime(2012,1,1), datetime(2015,9,18))

# Call the function
hurst(aapl['Adj Close'])

从这个用于估算Hurst指数的代码中，当我们想要计算滞后差的方差时，为什么我们仍然使用标准差并取平方根？我困惑了很长时间，我不知道为什么其他人不会有同样的混淆。我误解了背后的数学吗？谢谢！

Answer 1

我也很困惑。我不明白std的sqrt来自哪里，并花了3天时间试图找出它。最后，我注意到QuantStart将Tom Starke博士归功于使用略有不同的代码。 Tom Starke博士将Ernie Chan博士归功于his blog。我能够找到足够的信息来汇总我自己的原则代码。这不使用sqrt，使用variance而不是std，并在末尾使用2.0除数而不是2.0乘数。最后，它似乎给出了与你发布的quantstart代码相同的结果，但我能够从第一原理中理解它，我认为这很重要。我把一个更加清晰的Jupyter笔记本放在一起，但我不确定我是否可以在这里发布，所以我会尽力在这里解释一下。首先粘贴代码，然后解释。

lags = range(2,100)
def hurst_ernie_chan(p):

    variancetau = []; tau = []

    for lag in lags: 

        #  Write the different lags into a vector to compute a set of tau or lags
        tau.append(lag)

        # Compute the log returns on all days, then compute the variance on the difference in log returns
        # call this pp or the price difference
        pp = subtract(p[lag:], p[:-lag])
        variancetau.append(var(pp))

    # we now have a set of tau or lags and a corresponding set of variances.
    #print tau
    #print variancetau

    # plot the log of those variance against the log of tau and get the slope
    m = polyfit(log10(tau),log10(variancetau),1)

    hurst = m[0] / 2

    return hurst

陈博士没有在这个页面上给出任何代码（我相信他的工作在MATLAB而不是Python）。因此，我需要将他自己的代码放在他在博客中提供的笔记中，并将答案提供给他在博客上提出的问题。

陈博士指出，如果z是对数价格，那么以τ的间隔采样的波动率是波动率（τ）=√（Var（z（t）-z（t-τ）））。对我来说另一种描述波动率的方法是标准差，所以std（τ）=√（Var（z（t）-z（t-τ）））
std只是方差的根，所以var（τ）=（Var（z（t）-z（t-τ）））
陈博士然后说：一般来说，我们可以写Var（τ）ατ^（2H）其中H是赫斯特指数
因此（Var（z（t）-z（t-τ）））ατ^（2H）
取每边的对数得到log（Var（z（t）-z（t-τ）））α2Hlogτ
[log（Var（z（t）-z（t-τ）））/logτ] /2αH（给出赫斯特指数）我们知道最左边的方括号中的项是tau的对数 - 对数图的斜率和一组相应的方差。

如果您运行该功能并比较Quantstart功能的答案，它们应该是相同的。不确定这是否有帮助。

Answer 2

这里发生的一切都是数学符号的变化

我将定义

d = subtract(ts[lag:], ts[:-lag])

然后很明显

np.log(np.std(d)**2) == np.log(np.var(d))
np.log(np.std(d)) == .5*np.log(np.var(d))

然后你有等价

2*np.log(np.sqrt(np.std(d))) == .5*np.log(np.sqrt(np.var(d)))

polyfit的功能输出与其输入按比例缩放

Answer 3

根据 Ernest Chan 的“算法交易”（第 44 页）中的直观定义：

<块引用>

直观地说，“平稳”的价格序列意味着价格扩散比几何随机游走慢。

人们想要检查时间序列的方差，随着滞后的增加对滞后。这是因为对于正态分布 - 并且日志价格被认为是正常的（在某种程度上） - 正态分布总和的方差是成分方差的总和。根据 Ernest Chan 的引用，对于均值恢复过程，实现的方差将小于理论预测。

将其放入代码中：

def hurst(p, l):
    """
    Arguments:
        p: ndarray -- the price series to be tested
        l: list of integers or an integer -- lag(s) to test for mean reversion
    Returns:
        Hurst exponent
    """
    if isinstance(l, int):
        lags = [1, l]
    else:
        lags = l
    assert lags[-1] >=2, "Lag in prices must be greater or equal 2"
    print(f"Price lags of {lags[1:]} are included")
    lp = np.log(p)
    var = [np.var(lp[l:] - lp[:-l]) for l in lags]
    hr = linregress(np.log(lags), np.log(var))[0] / 2
    return hr

Answer 4

OP发布的代码是正确的。

混淆的原因是它首先执行平方根，然后通过将斜率（由polyfit返回）乘以2来对抗它。

有关更详细的说明，请继续阅读。

tau用“额外”平方根计算。然后，计算其日志。 log（sqrt（x））= log（x ^ 0.5）= 0.5 * log（x）（这是密钥）。 polyfit现在用y乘以“额外0.5”来进行拟合。因此，获得的结果也乘以接近0.5。返回两次（返回poly [0] * 2.0）对应初始（看似）额外的0.5。

希望这更清楚。

python中的Hurst指数

4 个答案: