Question

我有一些我在R中编写的代码，我想将其翻译成Python，但我是python的新手，所以需要一些帮助

R代码基本上模拟250个随机法线，然后计算排序的几何平均回报，然后计算最大亏损，它执行10000次然后合并结果，如下所示。

mu <- 0.06
sigma <- 0.20
days <- 250
n <- 10000
v <- do.call(rbind,lapply(seq(n),function(y){
  rtns <- rnorm(days,mu/days,sqrt(1/days)*sigma)
  p.rtns <- cumprod(rtns+1)
  p.rtns.md <- min((p.rtns/cummax(c(1,p.rtns))[-1])-1)
  tot.rtn <- p.rtns[days]-1
  c(tot.rtn,p.rtns.md)
}))

这是我在Python中的尝试，（如果你可以让它更短/更有说服力/更高效请建议作为答案）

import numpy as np
import pandas as pd
mu = float(0.06)
sigma = float(0.2)
days = float(250)
n = 10000
rtns = np.random.normal(loc=mu/days,scale=(((1/days)**0.5)*sigma),size=days)
rtns1 = rtns+1
prtns = rtns1.cumprod()
totrtn = prtns[len(prtns)-1] -1
h = prtns.tolist()
h.insert(0,float(1))
hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]

这就是我得到的...不太确定hdf是否正确得到p.rtns.md，并且不确定我将如何模拟这10000次。

非常感谢所有建议......

Answer 1

我对R不熟悉，但我看到可以对Python代码进行一些一般性的改进：

使用不0.06的{{1}}，因为Python会推断小数点的数值为float()
- 最后一行float可以替换为h.insert(0,float(1))
您可以使用h.insert(0,1.0)引用可迭代中的最后一项，使用[-1]引用倒数第二项，等等：
- [-2]

Python开发人员通常会在单词或camelcase之间选择下划线。此外，通常最好使用变量名中的完整单词，以便在屏幕上显示经济性。例如，此处的某些变量可以重命名为totrtn = prtns[-1] -1和returns或total_returns。

要运行模拟10000次，您应该使用totalReturns循环：

for

Answer 2

首先，你的最后一行代码：

hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]

不可能是对的。可能是根据您的R代码：

hdf = (pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)])-1)[1:len(h)]

其次，c(1,p.rtns)可以替换为np.hstack(1, prtns)，而不是将np.array转换为list。

第三，看起来您只是为pandas使用cummax()。实现一个并不难，比如：

def cummax(a):
    ac=a.copy()
    if a.size>0:
        max_idx=np.argmax(a)
        ac[max_idx:]=np.max(ac)
        ac[:max_idx]=cummax(ac[:max_idx])
    else:
        pass
    return ac

和

>>> a=np.random.randint(0,20,size=10)
>>> a
array([15, 15, 15,  8,  5, 14,  6, 18,  9,  1])
>>> cummax(a)
array([15, 15, 15, 15, 15, 15, 15, 18, 18, 18])

我们得到所有这些：

def run_simulation(mu, sigma, days, n):
    result=[]
    for i in range(n):
        rtns = np.random.normal(loc=1.*mu/days,
                    scale=(((1./days)**0.5)*sigma),
                    size=days)
        p_rtns = (rtns+1).cumprod()
        tot_rtn = p_rtns[-1]-1 
        #looks like you want the last element, rather than the 2nd form the last as you did
        p_rtns_md =(p_rtns/cummax(np.hstack((0.,p_rtns)))[1:]-1).min() 
        #looks like you want to skip the first element, python is different from R for that.
        result.append((tot_rtn, p_rtns_md))
    return result

和

>>> run_simulation(0.06, 0.2, 250,10)
[(0.096077511394818016, -0.16621830496112056), (0.73729333554192, -0.13566124517484235), (0.087761655465907973, -0.17862916081223446), (0.07434851091082928, -0.15972961033789046), (-0.094464694393288307, -0.2317397117033817), (-0.090720761054686627, -0.1454002204893271), (0.02221364097529932, -0.15606214341947877), (-0.12362835704696629, -0.24323096421682033), (0.023089144896788261, -0.16916790589553599), (0.39777037782177493, -0.10524624505023494)]

实际上没有必要使用循环，因为我们可以通过生成高斯随机变量的二维array（将size=days更改为size=(days, n)）来二维工作。避免循环很可能会更快。但是，这将需要不同的cummax()函数，因为此处显示的仅限于1D。但cummax()中的R也限制为1D（不完全是，如果您将2D传递给cummax()，它将被展平）。因此，为了保持Python和R之间的简单和可比性，让我们选择循环版本。

R转换为Python

2 个答案: