我在下面设置了一个模拟示例。 建立: 我有每周数据,比如说每周有6年的数据,大约1000个股票的数周,其他几周的数据少于1000个。我在时间t0随机选择了75只股票。在t1,一些股票死亡(概率p,过时)或离开指数(结构如合并)。我需要模拟股票,以便每周我都有75只股票。每周都会有一些股票死亡(介于0到75之间),我选择的新股票不是现有的75股。我还会检查股票是否因结构原因而下跌。每周我都会计算75只股票的回报。
问题:是否有明显的提高速度的原因。我开始使用Pandas对象(组排序),这会减慢速度。我没有尝试并行循环。听听我是否应该使用numba(但它没有np.in1d函数)或者是否有更快的方式进行洗牌(我实际上只需要对其进行洗牌),我会更有兴趣。我还考虑使用NaN创建一个所有股票ID的固定数组,这里的问题是我需要75个名字所以我仍然需要每周过滤掉这些NaN。
也许这是针对这个论坛的详细问题,如果是这样的话,我道歉
代码:
from timeit import default_timer
import numpy as np
# Create dataset
n_weeks = 312 # Approximately 6 years of weekly data
n_stocks = np.random.normal(1000, 5, n_weeks).astype(dtype=np.uint16) # Around 1000 stocks every week but not fixed
idx_new_week = np.cumsum(np.hstack((0, n_stocks)))
# We give each stock a stock idea
n_obs = n_stocks.sum()
stock_id = np.ones([n_obs], dtype=np.uint16)
for j in range(1, n_weeks+1):
stock_id[idx_new_week[j-1]:idx_new_week[j]] = np.cumsum(np.ones(n_stocks[j-1]))
stock_rtn = np.random.normal(0, 0.25/np.sqrt(52), n_obs) # Simulated forward (one week ahead) return for each stock
# Simulation part
# Week 0 pick randomly 75 stocks
# Week n >=1 a stock dies for two reasons
# 1) randomness (probability 'p')
# 2) structural event (could be merger, fall out of index).
# We cannot assume that it is always the high stockid which dies for structural reasons (as it looks like here)
# If a stock dies we randomely pick a stock from the "deak" stock dataset (not included the ones which dies this week)
n_sim = 100 # I want this to be 1 mill
n_stock_cand = 75 # For this example we pick 75 stocks
p_survial = 0.90
# The weekly periodcal returns
pf_rtn = np.zeros([n_weeks, n_sim])
start = default_timer()
for k in range(0, n_sim):
# Randomely choice n_stock_cand at time zero
boolean_list = np.array([False] * (n_stocks[0] - n_stock_cand) + [True] * n_stock_cand)
np.random.shuffle(boolean_list) # Shuffle the list
stock_id_this_week = stock_id[idx_new_week[0]:idx_new_week[1]][boolean_list]
stock_rtn_this_week = stock_rtn[idx_new_week[0]:idx_new_week[1]][boolean_list]
# This part only simulate the Buzz portfolio names - later we simulate returns and from specific holdings of the 75 names
for j in range(1, n_weeks):
pf_rtn[j-1, k] = stock_rtn_this_week.mean()
# Find the number of stocks to keep
boolean_keep_stocks = np.random.rand(n_stock_cand) < p_survial
# Next we need to check if a stock is still part of the universe next period
stock_cand_temp = stock_id[idx_new_week[j-1]:idx_new_week[j]]
stock_rtn_temp = stock_rtn[idx_new_week[j-1]:idx_new_week[j]]
boolean_keep_stocks = (boolean_keep_stocks) & (np.in1d(stock_id_this_week, stock_cand_temp, assume_unique=True))
n_stocks_to_replace = n_stock_cand - boolean_keep_stocks.sum() # Number of new stocks to pick this week
if n_stocks_to_replace > 0:
# We have to pick from stocks which is not part of the portfolio already
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=True)
n_stocks_to_pick_from = boolean_cand.sum()
boolean_list = np.array([False] * (n_stocks_to_pick_from - n_stocks_to_replace) + [True] * n_stocks_to_replace)
np.random.shuffle(boolean_list) # Shuffle the list
# First avoid picking the same stock twich, next pick from the unique candidate list
stock_id_new = stock_cand_temp[boolean_cand][boolean_list] # The new stocks
stock_rtn_new = stock_rtn_temp[boolean_cand][boolean_list] # and their returns
stock_id_this_week = np.hstack((stock_id_this_week[boolean_keep_stocks], stock_id_new))
stock_rtn_this_week = np.hstack((stock_rtn_this_week[boolean_keep_stocks], stock_rtn_new))
else:
# No replacement of stocks / all surview but order might differ
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=False)
stock_id_this_week = stock_cand_temp[boolean_cand]
stock_rtn_this_week = stock_rtn_temp[boolean_cand]
# PnL last period
pf_rtn[n_weeks-1, k] = stock_rtn_this_week.mean()
print(default_timer() - start)