问题" Efficiently create a density plot for high-density regions, points for sparse regions"要求用NaN替换低密度区域。接受的答案中的相关代码如下:
hh, locx, locy = scipy.histogram2d(xdat, ydat, range=xyrange, bins=bins)
posx = np.digitize(xdat, locx)
posy = np.digitize(ydat, locy)
#select points within the histogram
ind = (posx > 0) & (posx <= bins[0]) & (posy > 0) & (posy <= bins[1])
hhsub = hh[posx[ind] - 1, posy[ind] - 1] # values of the histogram where the points are
xdat1 = xdat[ind][hhsub < thresh] # low density points
ydat1 = ydat[ind][hhsub < thresh]
hh[hh < thresh] = np.nan # fill the areas with low density by NaNs
我发现了像
这样的东西hh = np.where(hh > thresh, hh, np.nan)
也在工作。在表现结果方面有什么不同?
答案 0 :(得分:1)
高级索引(即您的原始方法)效率更高,而结果相同!
我将这两个选项与以下代码进行了比较:
import numpy as np
import time
t1 = time.time()
for i in xrange(1000):
a = np.random.rand(10)
a[a>0.5] = np.nan
t2 = time.time()
print 'adv. idx.: ',t2-t1
t1 = time.time()
for i in xrange(1000):
a = np.random.rand(10)
a = np.where(a>0.5,np.nan,a)
t2 = time.time()
print 'np.where: ',t2-t1
结果很明显:
adv. idx.: 0.00600004196167
np.where: 0.0339999198914
np.where
明显变慢了!结果相同。但是,由于==
会产生np.nan == np.nan
,因此无法通过False
比较进行验证。