我有一个名为r的大型实际1-d数据集。我想绘图:
mean(log(1+a*r)) vs a, with a > -1 .
这是我的代码:
rr=pd.read_csv('goog.csv')
dd=rr['Close']
series=pd.Series(dd)
seriespct=series.pct_change()
seriespct[0]=seriespct.mean()
dum1 =[0]*len(dd)
a=1.
a_max = 1.
a_step = 0.01
a = scipy.arange(-3.+a_step, a_max, a_step)
n = len(a)
dum2 =[0]*n
m=len(dd)
for j in range(n):
for i in range(m):
dum1[i]=math.log(1+a[j]*seriespct[i])
dum2[j]=scipy.mean(dum1)
plt.plot(a,dum2)
plt.show()
我怎样才能以更优雅的方式做到这一点?
答案 0 :(得分:3)
我建议这样做:
plt.plot(a, np.log(1 + r*a[:,None]).mean(1))
这具有很大的速度优势,因为它避免了for循环,并且在数据集很大的情况下,numpy中完成的循环要快得多。
In [49]: a = np.arange(a_step-.3, a_max, a_step)
In [50]: r = np.random.random(100)
In [51]: timeit [scipy.mean(log(1+a[i]*r)) for i in range(len(a))]
100 loops, best of 3: 5.47 ms per loop
In [52]: timeit np.log(1 + r*a[:,None]).mean(1)
1000 loops, best of 3: 384 µs per loop
按broadcasting运行,以便a
沿一个轴变化,r
沿另一个轴变化,然后您可以沿r
变化的轴取均值,所以你仍然有一个随a
变化的数组(并且形状与a
相同):
import numpy as np
import matplotlib.pyplot as plt
r = np.random.random(100)
a = 1.
a_max = 1.
a_step = 0.01
a = np.arange(a_step-.3, a_max, a_step)
a.shape
#(129,)
a = a[:,None] #adds a new axis, making this a column vector, same as: a = a.reshape(-1,1)
a.shape
#(129, 1)
(a*r).shape
#(129, 100)
loga = np.log(1 + a*r)
loga.shape
#(129,100)
mloga = loga.mean(axis=1) #take the mean along the 2nd axis where `a` varies
mloga.shape
#(129,)
plt.plot(a, mloga)
plt.show()
为避免依赖广播,您可以使用np.outer
:
plt.plot(a, np.log(1 + np.outer(a,r)).mean(1))
无需重新塑造a
(跳过步骤a = a[:,None]
)
这是一个更简单的例子,所以你可以看到发生了什么:
r = np.exp(np.arange(1,5))
a = np.arange(5)
In [33]: r
Out[33]: array([ 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
In [34]: a
Out[34]: array([0, 1, 2, 3, 4])
In [39]: r*a[:,None]
Out[39]:
# this is 2.7... 7.3... 20.08... 54.5... # times:
array([[ 0. , 0. , 0. , 0. ], # 0
[ 2.71828183, 7.3890561 , 20.08553692, 54.59815003], # 1
[ 5.43656366, 14.7781122 , 40.17107385, 109.19630007], # 2
[ 8.15484549, 22.1671683 , 60.25661077, 163.7944501 ], # 3
[ 10.87312731, 29.5562244 , 80.34214769, 218.39260013]]) # 4
In [40]: np.outer(a,r)
Out[40]:
array([[ 0. , 0. , 0. , 0. ],
[ 2.71828183, 7.3890561 , 20.08553692, 54.59815003],
[ 5.43656366, 14.7781122 , 40.17107385, 109.19630007],
[ 8.15484549, 22.1671683 , 60.25661077, 163.7944501 ],
[ 10.87312731, 29.5562244 , 80.34214769, 218.39260013]])
# this is the mean of each column:
In [41]: (np.outer(a,r)).mean(1)
Out[41]: array([ 0. , 21.19775622, 42.39551244, 63.59326866, 84.79102488])
# and the log of 1 + the above is:
In [42]: np.log(1+(np.outer(a,r)).mean(1))
Out[42]: array([ 0. , 3.09999121, 3.77035604, 4.16811021, 4.4519144 ])
答案 1 :(得分:1)
你可以用scipy做手段。
您可以使用matplotlib进行绘图。
import scipy
from matplotlib import pyplot
#convert r from a python list to an 1-D array
r = scipy.array(r)
#edit these
a_max = 100
a_step = 0.1
a = scipy.arange(-1+a_step, a_max, a_step)
n = len(a)
pyplot.plot(a, [scipy.mean(log(1+a[i]*r)) for i in range(n)], 'b-')
pyplot.show()