我有一个2D数据阵列,我试图以有效的方式获得关于其中心的值的概况。所以输出应该是两个一维数组:一个是距离中心的距离值,另一个是原始2D中与中心距离的所有值的平均值。
每个索引与中心的距离都是非整数,这使我无法使用某些已知的问题解决方案。请允许我解释一下。
考虑这些矩阵
data = np.random.randn(5,5)
L = 2
x = np.arange(-L,L+1,1)*2.5
y = np.arange(-L,L+1,1)*2.5
xx, yy = np.meshgrid(x, y)
r = np.sqrt(xx**2. + yy**2.)
所以矩阵是
In [30]: r
Out[30]:
array([[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 5. , 2.5 , 0. , 2.5 , 5. ],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781]])
In [31]: data
Out[31]:
array([[ 1.27603322, 1.33635284, 1.93093228, 0.76229675, -0.00956535],
[ 0.69556071, -1.70829753, 1.19615919, -1.32868665, 0.29679494],
[ 0.13097791, -1.33302719, 1.48226442, -0.76672223, -1.01836614],
[ 0.51334771, -0.83863115, -0.41541794, 0.34743342, 0.1199237 ],
[-1.02042539, 0.90739383, -2.4858624 , -0.07417987, 0.90748933]])
对于这种情况,对于距离索引,预期输出应为array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994, 7.07106781])
,对于相应距离的所有值的平均值,第二个长度相同的数组:array([ 0.98791323, -0.32496927, 0.37221219, -0.6209728 , 0.27986926, 0.04060628])
。< / p>
从this answer开始,有一个非常好的函数来计算任意点的轮廓。然而,他的方法的问题是它通过索引距离近似距离r
。所以他的r
就我的情况而言:
array([[2, 2, 2, 2, 2],
[2, 1, 1, 1, 2],
[2, 1, 0, 1, 2],
[2, 1, 1, 1, 2],
[2, 2, 2, 2, 2]])
这对我来说是一个很大的区别,因为我正在使用小矩阵。然而,这种近似使他能够使用np.bincount
,这非常方便(但不会为我工作)。
我一直试图将其扩展为浮动距离,就像我的版本r
一样,但到目前为止还没有运气。 bincount
不适用于花车,而histogram
需要等间距的纸盒,但实际情况并非如此。有什么建议吗?
答案 0 :(得分:1)
方法#1
def radial_profile_app1(data, r):
mid = data.shape[0]//2
ids = np.rint((r**2)/r[mid-1,mid]**2).astype(int).ravel()
count = np.bincount(ids)
R = data.shape[0]//2 # Radial profile radius
R0 = R+1
dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])
mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
return dists, mean_data
对于给定的样本数据 -
In [475]: radial_profile_app1(data, r)
Out[475]:
(array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994,
7.07106781]),
array([ 1.48226442 , -0.3297520425, -0.8820454775, -0.3605795875,
0.5696863263, 0.2883829525]))
方法#2
def radial_profile_app2(data, r):
R = data.shape[0]//2 # Radial profile radius
range_arr = np.arange(-R,R+1)
ids = (range_arr[:,None]**2 + range_arr**2).ravel()
count = np.bincount(ids)
R0 = R+1
dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])
mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
return dists, mean_data
运行时测试 -
In [562]: # Setup inputs
...: N = 2001
...: data = np.random.randn(N,N)
...: L = (N-1)//2
...: x = np.arange(-L,L+1,1)*2.5
...: y = np.arange(-L,L+1,1)*2.5
...: xx, yy = np.meshgrid(x, y)
...: r = np.sqrt(xx**2. + yy**2.)
...:
In [563]: out01, out02 = radial_profile_app1(data, r)
...: out11, out12 = radial_profile_app2(data, r)
...:
...: print np.allclose(out01, out11)
...: print np.allclose(out02, out12)
...:
True
True
In [566]: %timeit radial_profile_app1(data, r)
...: %timeit radial_profile_app2(data, r)
...:
10 loops, best of 3: 114 ms per loop
10 loops, best of 3: 91.2 ms per loop
答案 1 :(得分:0)
得到了我对此功能的期望:
def radial_prof(data, r):
uniq = np.unique(r)
prof = np.array([ np.mean(data[ r==un ]) for un in uniq ])
return uniq, prof
但是我仍然不满意我必须使用列表理解(或python循环)这一事实,因为对于非常大的矩阵来说它可能会很慢。
答案 2 :(得分:0)
这是一种间接排序方法,如果批量大小和/或箱数很大,则应该很好地扩展。排序为O(n log n),所有直方图都是O(n)。我还添加了一些不科学的速度测试。对于速度测试,我使用平面索引,但我留下了2d索引代码,因为它在处理不同大小的图像时更灵活等。
import numpy as np
# this need only be run once per batch
def r_to_ind(r, dist_bins="auto"):
f = np.argsort(r.ravel())
if dist_bins == "auto":
rs = r.ravel()[f]
bins = np.where(np.r_[True, rs[1:]!=rs[:-1]])[0]
dist_bins = rs[bins]
else:
bins = np.searchsorted(r.ravel()[f], dist_bins)
denom = np.diff(np.r_[bins, r.size])
return f, np.unravel_index(f, r.shape), bins, denom, dist_bins
# this is with adjustable offset
def profile_xy(image, yx, ij, bins, nynx, denom):
(y, x), (i, j), (ny, nx) = yx, ij, nynx
return np.add.reduceat(image[i + y - ny//2, j + x - nx//2], bins) / denom
# this is fixed
def profile_xy_no_offset(image, ij, bins, denom):
return np.add.reduceat(image[ij], bins) / denom
# this is fixed and flat
def profile_xy_no_offset_flat(image, k, bins, denom):
return np.add.reduceat(image.ravel()[k], bins) / denom
data = np.array([[ 1.27603322, 1.33635284, 1.93093228, 0.76229675, -0.00956535],
[ 0.69556071, -1.70829753, 1.19615919, -1.32868665, 0.29679494],
[ 0.13097791, -1.33302719, 1.48226442, -0.76672223, -1.01836614],
[ 0.51334771, -0.83863115, -0.41541794, 0.34743342, 0.1199237 ],
[-1.02042539, 0.90739383, -2.4858624 , -0.07417987, 0.90748933]])
r = np.array([[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 5. , 2.5 , 0. , 2.5 , 5. ],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781]])
f, (i, j), bins, denom, dist_bins = r_to_ind(r)
result = profile_xy(data, (2, 2), (i, j), bins, (5, 5), denom)
print(dist_bins)
# [ 0. 2.5 3.53553391 5. 5.59016994 7.07106781]
print(result)
# [ 1.48226442 -0.32975204 -0.88204548 -0.36057959 0.56968633 0.28838295]
#########################
from timeit import timeit
n = 2001
batch = 100
fake = 10
a = np.random.random((fake, n, n))
l = np.linspace(-1, 1, n)**2
r = sum(np.ix_(l, l))
def run_all():
f, ij, bins, denom, dist_bins = r_to_ind(r)
for b in range(batch):
profile_xy_no_offset_flat(a[b%fake], f, bins, denom)
print(timeit(run_all, number=10))
# 47.4157 (for 10 batches of 100 images of size 2001x2001)
# and my computer is slower than Divakar's ;-)
我已经做了一些基准测试,比较我的和@Divakar的方法3,将所有可预编译的一切都拆分为每批次运行一次的功能。一般的发现:他们是相似的,我的前期成本较高,但后来更快。但他们每批只能拍摄约100张照片。