Question

我在下面设置了一个模拟示例。建立：我有每周数据，比如说每周有6年的数据，大约1000个股票的数周，其他几周的数据少于1000个。我在时间t0随机选择了75只股票。在t1，一些股票死亡（概率p，过时）或离开指数（结构如合并）。我需要模拟股票，以便每周我都有75只股票。每周都会有一些股票死亡（介于0到75之间），我选择的新股票不是现有的75股。我还会检查股票是否因结构原因而下跌。每周我都会计算75只股票的回报。

问题：是否有明显的提高速度的原因。我开始使用Pandas对象（组排序），这会减慢速度。我没有尝试并行循环。听听我是否应该使用numba（但它没有np.in1d函数）或者是否有更快的方式进行洗牌（我实际上只需要对其进行洗牌），我会更有兴趣。我还考虑使用NaN创建一个所有股票ID的固定数组，这里的问题是我需要75个名字所以我仍然需要每周过滤掉这些NaN。

也许这是针对这个论坛的详细问题，如果是这样的话，我道歉

代码：

from timeit import default_timer
import numpy as np

# Create dataset
n_weeks = 312 # Approximately 6 years of weekly data
n_stocks = np.random.normal(1000, 5, n_weeks).astype(dtype=np.uint16) # Around 1000 stocks every week but not fixed
idx_new_week = np.cumsum(np.hstack((0, n_stocks)))

# We give each stock a stock idea
n_obs = n_stocks.sum()
stock_id = np.ones([n_obs],  dtype=np.uint16)
for j in range(1, n_weeks+1):    
    stock_id[idx_new_week[j-1]:idx_new_week[j]] = np.cumsum(np.ones(n_stocks[j-1]))
stock_rtn = np.random.normal(0, 0.25/np.sqrt(52), n_obs) # Simulated forward (one week ahead) return for each stock 

# Simulation part
# Week 0 pick randomly 75 stocks 
# Week n >=1 a stock dies for two reasons
# 1) randomness (probability 'p')
# 2) structural event (could be merger, fall out of index). 
# We cannot assume that it is always the high stockid which dies for structural reasons (as it looks like here)
# If a stock dies we randomely pick a stock from the "deak" stock dataset (not included the ones which dies this week)

n_sim = 100 # I want this to be 1 mill
n_stock_cand = 75 # For this example we pick 75 stocks
p_survial = 0.90

# The weekly periodcal returns
pf_rtn = np.zeros([n_weeks, n_sim])

start = default_timer()
for k in range(0, n_sim):

    # Randomely choice n_stock_cand at time zero
    boolean_list = np.array([False] * (n_stocks[0] - n_stock_cand) + [True] * n_stock_cand)
    np.random.shuffle(boolean_list) # Shuffle the list

    stock_id_this_week = stock_id[idx_new_week[0]:idx_new_week[1]][boolean_list] 
    stock_rtn_this_week = stock_rtn[idx_new_week[0]:idx_new_week[1]][boolean_list]

    # This part only simulate the Buzz portfolio names - later we simulate returns and from specific holdings of the 75 names    
    for j in range(1, n_weeks):

        pf_rtn[j-1, k] = stock_rtn_this_week.mean() 

        # Find the number of stocks to keep
        boolean_keep_stocks = np.random.rand(n_stock_cand) < p_survial

        # Next we need to check if a stock is still part of the universe next period  
        stock_cand_temp = stock_id[idx_new_week[j-1]:idx_new_week[j]]
        stock_rtn_temp = stock_rtn[idx_new_week[j-1]:idx_new_week[j]]

        boolean_keep_stocks = (boolean_keep_stocks) & (np.in1d(stock_id_this_week, stock_cand_temp, assume_unique=True)) 
        n_stocks_to_replace = n_stock_cand - boolean_keep_stocks.sum() # Number of new stocks to pick this week

        if n_stocks_to_replace > 0:
            # We have to pick from stocks which is not part of the portfolio already        
            boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=True)
            n_stocks_to_pick_from = boolean_cand.sum()        
            boolean_list = np.array([False] * (n_stocks_to_pick_from - n_stocks_to_replace) + [True] * n_stocks_to_replace)
            np.random.shuffle(boolean_list) # Shuffle the list        

            # First avoid picking the same stock twich, next pick from the unique candidate list
            stock_id_new = stock_cand_temp[boolean_cand][boolean_list] # The new stocks 
            stock_rtn_new = stock_rtn_temp[boolean_cand][boolean_list] # and their returns

            stock_id_this_week = np.hstack((stock_id_this_week[boolean_keep_stocks], stock_id_new))
            stock_rtn_this_week = np.hstack((stock_rtn_this_week[boolean_keep_stocks], stock_rtn_new))
        else:
            # No replacement of stocks / all surview but order might differ            
            boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=False)
            stock_id_this_week = stock_cand_temp[boolean_cand]
            stock_rtn_this_week = stock_rtn_temp[boolean_cand]

    # PnL last period
    pf_rtn[n_weeks-1, k] = stock_rtn_this_week.mean() 

print(default_timer() - start)

Python - 路径依赖模拟

0 个答案: