熊猫.describe()在表中返回错误的列值

时间:2020-01-14 12:36:39

标签: python-3.x pandas dataframe

看看图1的gld_weight列。它抛出了完全错误的值。 btc_weight + gld_weight应该总是加起来为1。但是当我使用describe函数时,为什么gld_weight列不对应于返回的行值?

图1:enter image description here

图2:enter image description here

图3:enter image description here

这是我的源代码:

import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt

assets = ['BTC-USD', 'GLD']
mydata = pd.DataFrame()

for asset in assets:
    mydata[asset] = wb.DataReader(asset, data_source='yahoo', start='2015-1-1')['Close']

cleandata = mydata.dropna()
log_returns = np.log(cleandata/cleandata.shift(1))

annual_log_returns = log_returns.mean() * 252 * 100
annual_log_returns

annual_cov = log_returns.cov() * 252
annual_cov
pfolio_returns = []
pfolio_volatility = []
btc_weight = []
gld_weight = []

for x in range(1000):
    weights = np.random.random(2)
    weights[0] = weights[0]/np.sum(weights)
    weights[1] = weights[1]/np.sum(weights)
    weights /= np.sum(weights)
    btc_weight.append(weights[0])
    gld_weight.append(weights[1])


    pfolio_returns.append(np.dot(annual_log_returns, weights))
    pfolio_volatility.append(np.sqrt(np.dot(weights.T, np.dot(annual_cov, weights))))

pfolio_returns
pfolio_volatility
npfolio_returns = np.array(pfolio_returns)
npfolio_volatility = np.array(pfolio_volatility)

new_portfolio = pd.DataFrame({
    'Returns': npfolio_returns,
    'Volatility': npfolio_volatility,
    'btc_weight': btc_weight,
    'gld_weight': gld_weight
})

2 个答案:

答案 0 :(得分:0)

我不是100%肯定我正确地回答了您的问题,但可能是一个问题,您没有将输出重新分配给新变量,因此没有保存它。 尝试调整此代码:

new_portfolio = new_portfolio.sort_values(by="Returns")

或将就地参数设置为True-link

答案 1 :(得分:0)

简短答案:

在for循环中发现了当前的问题,因为初始重量值已标准化。修复方式:请参见下面的更新1。

获得解决方案的背景:

乍一看,OP的代码似乎是有序的,并且数组中的值符合通过书面代码发出的OP请求的期望。从测试看来,使用range(1000)可能会带来麻烦,因为由于大量的“随机性”结果而导致失去了价值结果监督。尤其是当这个问题被写成一个转型问题时。因此,x / y轴值混合或其他类型的转换误差很难研究。

  1. 为解决这个问题,我使用了annual_log_returnsannual_cov可以看到的静态值。

  2. 然后,我已锁定所有输出以进行打印,因此这些值将锁定在适当的位置,并且无法在处理过程中进一步更改。 .. prints of code可能会在运行时更改,因为数组未锁定(这也是Pavel Klammert在他的回答中建议的)。

  3. 在评论反馈后,我弄清楚了OP的含义是“这些值是错误的。然后,我重点介绍如何创建用于填充数组的所用值。

发现了“抛出错误的值的问题:

使用weights[0] = weights[0]/np.sum(weights)会替换新weights[0]的原始列表weights[0]值,然后将其用作weights[1] = weights[1]/np.sum(weights)的新输入,因此永远不会达到sum = 1。 / p>

  1. 在创建weights[0] [0]和[1]值后,立即在两个位置将变量名weights[1]weights更改为'a'和'b'。防止覆盖初始的weights值。然后结果就是“计划好的”。

问题解决了。


import numpy as np
import pandas as pd

pfolio_returns = []
pfolio_volatility = []
btc_weight = []
gld_weight = []

annual_log_returns = [0.69, 0.71]
annual_cov = 0.73

ranger = 5

for x in range(ranger):
    weights = np.random.random(2)
    weights[0] = weights[0]/np.sum(weights)
    weights[1] = weights[1]/np.sum(weights)
    weights /= np.sum(weights)
    btc_weight.append(weights[0])
    gld_weight.append(weights[1])


    pfolio_returns.append(np.dot(annual_log_returns, weights))
    pfolio_volatility.append(np.sqrt(np.dot(weights.T, np.dot(annual_cov, weights))))

print (weights[0])
print (weights[1])
print (weights)

#print (pfolio_returns)
#print (pfolio_volatility)

npfolio_returns    = np.array(pfolio_returns)
npfolio_volatility = np.array(pfolio_volatility)

#df = pd.DataFrame(array, index = row_names, columns=colomn_names, dtype = dtype)

new_portfolio = pd.DataFrame({'Returns': npfolio_returns, 'Volatility': npfolio_volatility, 'btc_weight': btc_weight, 'gld_weight': gld_weight})

print (new_portfolio, '\n')

sort = new_portfolio.sort_values(by='Returns')

sort_max_gld_weight = sort.loc[ranger-1, 'gld_weight']

print ('Sort:\n', sort, '\n')

print ('sort max_gld_weight : "%s"\n' % sort_max_gld_weight)  # if "999" contains the highest gld_weight... but most cases its not!

sort_max_gld_weight = sort.max(axis=0)[3] # this returns colomn 4 'gld_weight' value.

print ('sort max_gld_weight : "%s"\n' % sort_max_gld_weight)  # this returns colomn 4 'gld_weight' value.

desc = new_portfolio.describe()

desc_max_gld_weight =desc.loc['max', 'gld_weight']

print ('Describe:\n', desc, '\n')
print ('desc max_gld_weight : "%s"\n' % desc_max_gld_weight)

max_val_gld = new_portfolio.loc[new_portfolio['gld_weight'] == sort_max_gld_weight]

print('max val gld:\n', max_val_gld, '\n')

locations = new_portfolio.loc[new_portfolio['gld_weight'] > 0.99]

print ('location:\n', locations)

结果可以例如:

0.9779586087178525
0.02204139128214753
[0.97795861 0.02204139]
    Returns  Volatility  btc_weight  gld_weight
0  0.702820    0.627707    0.359024    0.640976
1  0.709807    0.846179    0.009670    0.990330
2  0.708724    0.801756    0.063786    0.936214
3  0.702010    0.616237    0.399496    0.600504
4  0.690441    0.835780    0.977959    0.022041 

Sort:
     Returns  Volatility  btc_weight  gld_weight
4  0.690441    0.835780    0.977959    0.022041
3  0.702010    0.616237    0.399496    0.600504
0  0.702820    0.627707    0.359024    0.640976
2  0.708724    0.801756    0.063786    0.936214
1  0.709807    0.846179    0.009670    0.990330 

sort max_gld_weight : "0.02204139128214753"

sort max_gld_weight : "0.9903300366638084"

Describe:
         Returns  Volatility  btc_weight  gld_weight
count  5.000000    5.000000    5.000000    5.000000
mean   0.702760    0.745532    0.361987    0.638013
std    0.007706    0.114057    0.385321    0.385321
min    0.690441    0.616237    0.009670    0.022041
25%    0.702010    0.627707    0.063786    0.600504
50%    0.702820    0.801756    0.359024    0.640976
75%    0.708724    0.835780    0.399496    0.936214
max    0.709807    0.846179    0.977959    0.990330 

desc max_gld_weight : "0.9903300366638084"

max val gld:
     Returns  Volatility  btc_weight  gld_weight
1  0.709807    0.846179     0.00967     0.99033 

loacation:
    Returns  Volatility  btc_weight  gld_weight
1  0.709807    0.846179     0.00967     0.99033

更新1:

for x in range(ranger):
    weights = np.random.random(2)
    print (weights)
    a = weights[0]/np.sum(weights)  # see comments below.
    print (weights[0])
    b = weights[1]/np.sum(weights)  # see comments below.
    print (weights[1])
    print ('w0 + w1=', weights[0] + weights[1])
    weights /= np.sum(weights)
    btc_weight.append(a)
    gld_weight.append(b)
    print('a=', a, 'b=',b , 'a+b=', a+b)

新的输出例如:

[0.37710183 0.72933416]
0.3771018292953062
0.7293341569809412
w0 + w1= 1.1064359862762474
a= 0.34082570882790686 b= 0.6591742911720931 a+b= 1.0
[0.09301326 0.05296838]
0.09301326441107827
0.05296838430180717
w0 + w1= 0.14598164871288544
a= 0.637157240181712 b= 0.3628427598182879 a+b= 1.0
[0.48501305 0.56078073]
0.48501305100305336
0.5607807281299131
w0 + w1= 1.0457937791329663
a= 0.46377503928658087 b= 0.5362249607134192 a+b= 1.0
[0.41271663 0.89734662]
0.4127166254704412
0.8973466186511199
w0 + w1= 1.3100632441215612
a= 0.31503564986069105 b= 0.6849643501393089 a+b= 1.0
[0.11854074 0.57862593]
0.11854073835784273
0.5786259314340823
w0 + w1= 0.697166669791925
a= 0.1700321364950252 b= 0.8299678635049749 a+b= 1.0

结果打印在for循环之外:

0.1700321364950252
0.8299678635049749
[0.17003214 0.82996786]