我有一个装有重量的熊猫df。行包含日期,列包含资产名称。每行总和为1。
我要跑步
df_with_stocks_weight.apply(rescale_w, weight_min=0.01, weight_max=0.30)
以进行更改,以使权重仍为1,但最小值为1%,最大值为30%。我尝试使用下面的函数,但索引出现问题:计算出的值正确,但是输出指向错误的资产!
def rescale_w(row_input, weight_min, weight_max):
'''
:param row_input: a row from a pandas df
:param weight_min: the floor. type float.
:param weight_max: the cap. type float.
:return: a pandas row where weights are adjusted to specify min max.
step 1:
while any asset has weight above weight_max,
set that asset's weight to == weight_max
and distribute the leftovers to all other assets (whose weight are >0)
in accordance with their weight.
step 2:
if there is a positive weight below min_weight,
force it to == min_weight
by stealing from every other asset
(except those whose weight == max_weight).
note that the function produce strange output with few assets.
for example with 3 assets and max 30% the sum is 0.90
and if A=50% B=20% and one other asset is 1% then
these are not practical problems as we will analyze on data with many assets.
'''
# rename
w1 = row_input
# na
# script returned many errors regarding na
# so i a fillna(0) here.
# if that will be the final solution, some cleaning up can be done
# eg remove _null objects and remove some assertions.
w1 = w1.fillna(0)
# remove zeroes to get a faster script
w1nz = w1[w1 > 0]
w1z = w1[w1 == 0]
assert len(w1) == len(w1nz) + len(w1z)
assert set(w1nz.index).intersection(set(w1z.index)) == set()
# input must sum to 1
assert abs(w1nz.sum()-1) < 0.001
# only execute if there is at least one notnull value
# below will work with nz
if len(w1nz) > 0:
# step 1: make sure upper threshold is satisfied
while max(w1nz) > weight_max:
# clip at 30%
w2 = w1nz.clip(upper=weight_max)
# calc leftovers from this upper clip
leftover_upper = 1 - w2.sum()
# add leftovers to the untouched, in accordance with weight
w2_touched = w2[w2 == weight_max]
w2_unt = w2[(weight_max > w2) & (w2 > 0)]
w2_unt_added = w2_unt + leftover_upper * w2_unt / w2_unt.sum()
# concat all back
w3 = pd.concat([w2_touched, w2_unt_added], axis=0)
# same index for output and input
#w3 = w3.reindex(w1nz.index) # todo prövar nu att ta bort .reindex överallt. ser om pd löser det själv automatiskt
# rename w3 so that it works in a while loop
w1nz = w3
usestep2 = False
if usestep2:
# step 2: make sure lower threshold is satisfied
if min(w1nz) < weight_min:
# three parts: lower, middle, upper.
# those in "lower" will recieve from those in "middle"
upper = w1nz[w1nz >= weight_max]
middle = w1nz[(w1nz > weight_min) & (w1nz < weight_max)]
lower = w1nz[w1nz <= weight_min]
# assert len
assert (len(upper) + len(middle) + len(lower) == len(w1nz))
# change lower to == weight_min
lower_modified = lower.clip(lower=weight_min)
# the weights given to "lower" is stolen from "middle"
stolen_weigths = lower_modified.sum() - lower.sum()
middle_modified = middle - stolen_weigths * middle / middle.sum()
# concat
w4 = pd.concat([lower_modified,
middle_modified,
upper], axis=0)
# reindex
#w4 = w4.reindex(w1nz.index)
# rename
w1nz = w4
# lastly, concat adjusted nonzero with zero.
w1adj = pd.concat([w1nz, w1z], axis=0)
w1adj = w1adj.reindex(w1.index) # works?
assert (w1adj.index == w1.index).all()
assert abs(w1adj.sum() - 1 < 0.001)
return (w1adj)