我有一个scatter plot
,它被归类为4 Bins
。它们在中间被两个arcs
和一个line
隔开(见下图)。
两个arcs
有点问题。如果X-Coordiante
大于ang2
,则不会归因于正确的Bin
。 (请参见下图)
import math
import matplotlib.pyplot as plt
import matplotlib as mpl
X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]
Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]
fig, ax = plt.subplots()
ax.set_xlim(-100,100)
ax.set_ylim(-40,140)
ax.grid(False)
plt.scatter(X,Y)
#middle line
BIN_23_X = 0
#two arcs
ang1 = -60, 60
ang2 = 60, 60
angle = math.degrees(math.acos(2/9.15))
E_xy = 0,60
Halfway = mpl.lines.Line2D((BIN_23_X,BIN_23_X), (0,125), color = 'white', lw = 1.5, alpha = 0.8, zorder = 1)
arc1 = mpl.patches.Arc(ang1, 70, 110, angle = 0, theta2 = angle, theta1 = 360-angle, color = 'white', lw = 2)
arc2 = mpl.patches.Arc(ang2, 70, 110, angle = 0, theta2 = 180+angle, theta1 = 180-angle, color = 'white', lw = 2)
Oval = mpl.patches.Ellipse(E_xy, 160, 130, lw = 3, edgecolor = 'black', color = 'white', alpha = 0.2)
ax.add_line(Halfway)
ax.add_patch(arc1)
ax.add_patch(arc2)
ax.add_patch(Oval)
#Sorting the coordinates into bins
def get_nearest_arc_vert(x, y, arc_vertices):
err = (arc_vertices[:,0] - x)**2 + (arc_vertices[:,1] - y)**2
nearest = (arc_vertices[err == min(err)])[0]
return nearest
arc1v = ax.transData.inverted().transform(arc1.get_verts())
arc2v = ax.transData.inverted().transform(arc2.get_verts())
def classify_pointset(vx, vy):
bins = {(k+1):[] for k in range(4)}
for (x,y) in zip(vx, vy):
nx1, ny1 = get_nearest_arc_vert(x, y, arc1v)
nx2, ny2 = get_nearest_arc_vert(x, y, arc2v)
if x < nx1:
bins[1].append((x,y))
elif x > nx2:
bins[4].append((x,y))
else:
if x < BIN_23_X:
bins[2].append((x,y))
else:
bins[3].append((x,y))
return bins
#Bins Output
bins_red = classify_pointset(X,Y)
all_points = [None] * 5
for bin_key in [1,2,3,4]:
all_points[bin_key] = bins_red[bin_key]
输出:
[[], [], [(24, 94), (15, 61), (71, 76), (72, 83), (6, 69), (13, 86), (77, 78), (62, 94)], [(52, 57), (52, 45), (46, 82), (43, 74), (31, 56), (35, 70), (41, 94)]]
这不太正确。看着下面的figure output
,4 coordinates
在Bin 3
中,而11
在Bin 4
中。但是8
属于Bin 3
,而7
属于Bin 4
。
我认为问题是blue coordinates
。具体而言,当X-Coordinate
大于ang2
时。如果我将其更改为小于60
,它们将被更正为60
。
我不确定是否应该将Bin 3
扩展为arcs
,或者是否可以对代码进行改进?
请注意,这仅适用于60
和Bin 4
。 ang2
和Bin 1
会发生此问题。也就是说,如果X-Cooridnate 小于60 ,它将不会被归因于ang1
预期输出:
Bin 1
注意:首选预期的输出。该示例使用一个[[], [], [(24, 94), (15, 61), (6, 69), (13, 86)], [(71, 76), (72, 83), (52, 57), (52, 45), (46, 82), (43, 74), (31, 56), (35, 70), (41, 94), (77, 78), (62, 94)]]
输入数据。但是,我的数据集更大。如果我们使用大量row
,则输出应逐行显示。例如
rows
出局:
#Numerous rows
X = np.random.randint(50, size=(100, 10))
Y = np.random.randint(80, size=(100, 10))
答案 0 :(得分:1)
补丁程序会测试是否包含点:contains_point
,甚至包含点数组:contains_points
仅此而已,我为您准备了一个代码段,您可以在添加补丁的部分和#Sorting the coordinates into bins
代码块之间添加该代码段。
它添加了两个附加的(透明)椭圆,以计算圆弧是完全闭合的椭圆时是否包含点。那么,如果某点属于大椭圆形,左或右椭圆或正或负x坐标,则bin计算只是测试的布尔组合。
ov1 = mpl.patches.Ellipse(ang1, 70, 110, alpha=0)
ov2 = mpl.patches.Ellipse(ang2, 70, 110, alpha=0)
ax.add_patch(ov1)
ax.add_patch(ov2)
for px, py in zip(X, Y):
in_oval = Oval.contains_point(ax.transData.transform(([px, py])), 0)
in_left = ov1.contains_point(ax.transData.transform(([px, py])), 0)
in_right = ov2.contains_point(ax.transData.transform(([px, py])), 0)
on_left = px < 0
on_right = px > 0
if in_oval:
if in_left:
n_bin = 1
elif in_right:
n_bin = 4
elif on_left:
n_bin = 2
elif on_right:
n_bin = 3
else:
n_bin = -1
else:
n_bin = -1
print('({:>2}/{:>2}) is {}'.format(px, py, 'in Bin ' +str(n_bin) if n_bin>0 else 'outside'))
输出为:
(24/94) is in Bin 3
(15/61) is in Bin 3
(71/76) is in Bin 4
(72/83) is in Bin 4
( 6/69) is in Bin 3
(13/86) is in Bin 3
(77/78) is outside
(52/57) is in Bin 4
(52/45) is in Bin 4
(62/94) is in Bin 4
(46/82) is in Bin 4
(43/74) is in Bin 4
(31/56) is in Bin 4
(35/70) is in Bin 4
(41/94) is in Bin 4
请注意,当点具有x-coord = 0时,您仍然应该决定如何定义bin-在它们等于外部时,因为on_left
和on_right
都不对它们负责...
PS:感谢@ImportanceOfBeingErnest提供了必要转换的提示:https://stackoverflow.com/a/49112347/8300135
注意:对于以下所有编辑,您都需要 import numpy as np
编辑:
用于计算每个X, Y
数组输入的bin分布的函数:
def bin_counts(X, Y):
bc = dict()
E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
L = np.array(X) < 0
R = np.array(X) > 0
bc[1] = np.sum(E & E_l)
bc[2] = np.sum(E & L & ~E_l)
bc[3] = np.sum(E & R & ~E_r)
bc[4] = np.sum(E & E_r)
return bc
将导致以下结果:
bin_counts(X, Y)
Out: {1: 0, 2: 0, 3: 4, 4: 10}
EDIT2: X和Y的两个2D数组中有很多行:
np.random.seed(42)
X = np.random.randint(-80, 80, size=(100, 10))
Y = np.random.randint(0, 120, size=(100, 10))
循环遍历所有行:
for xr, yr in zip(X, Y):
print(bin_counts(xr, yr))
结果:
{1: 1, 2: 2, 3: 6, 4: 0}
{1: 1, 2: 0, 3: 4, 4: 2}
{1: 5, 2: 2, 3: 1, 4: 1}
...
{1: 3, 2: 2, 3: 2, 4: 0}
{1: 2, 2: 4, 3: 1, 4: 1}
{1: 1, 2: 1, 3: 6, 4: 2}
EDIT3: 为了不返回每个bin中的点数,而是返回包含四个包含每个bin中点的x,y坐标的数组的数组,请使用以下命令:
X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]
Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]
def bin_points(X, Y):
X = np.array(X)
Y = np.array(Y)
E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
L = X < 0
R = X > 0
bp1 = np.array([X[E & E_l], Y[E & E_l]]).T
bp2 = np.array([X[E & L & ~E_l], Y[E & L & ~E_l]]).T
bp3 = np.array([X[E & R & ~E_r], Y[E & R & ~E_r]]).T
bp4 = np.array([X[E & E_r], Y[E & E_r]]).T
return [bp1, bp2, bp3, bp4]
print(bin_points(X, Y))
[array([], shape=(0, 2), dtype=int32), array([], shape=(0, 2), dtype=int32), array([[24, 94],
[15, 61],
[ 6, 69],
[13, 86]]), array([[71, 76],
[72, 83],
[52, 57],
[52, 45],
[62, 94],
[46, 82],
[43, 74],
[31, 56],
[35, 70],
[41, 94]])]
...同样,要将其应用于大型2D阵列,只需对其进行迭代:
np.random.seed(42)
X = np.random.randint(-100, 100, size=(100, 10))
Y = np.random.randint(-40, 140, size=(100, 10))
bincol = ['r', 'g', 'b', 'y', 'k']
for xr, yr in zip(X, Y):
for i, binned_points in enumerate(bin_points(xr, yr)):
ax.scatter(*binned_points.T, c=bincol[i], marker='o' if i<4 else 'x')
答案 1 :(得分:1)
这是我将其分类为椭圆形的版本。由于OP使用的是简单的几何形状,因此可以使用简单的公式(即不“询问”补丁)进行测试。我将其归纳为n个弧,但有一个小的缺点,即bin编号不是从左到右,但是可以在其他地方使用。 输出类型
[ [ [x,y], [x,y],...], ... ]
即每个垃圾箱的x,y列表。不过这里的编号是从-3到3,其中0在外面。
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
def in_ellipse( xy, x0y0ab):
x, y = xy
x0, y0 = x0y0ab[0]
a = x0y0ab[1]/2. ## as the list of ellipses takes width and not semi axis
b = x0y0ab[2]/2.
return ( x - x0 )**2 / a**2+ ( y - y0 )**2 / b**2 < 1
def sort_into_bins( xy, mainE, eList ):
binCntr = 0
xyA = (np.abs(xy[0]),xy[1]) ## all positive
if in_ellipse( xyA, mainE ):
binCntr +=1
for ell in eList:
if in_ellipse( xyA, ell ):
break
binCntr +=1
binCntr=np.copysign( binCntr, xy[0] )
return int( binCntr )
X = 200 * np.random.random(150) - 100
Y = 140 * np.random.random(150) - 70 + 60
fig, ax = plt.subplots()
ax.set_xlim(-100,100)
ax.set_ylim(-40,140)
ax.grid(False)
BIN_23_X = 0
mainEllipse = [ np.array([0, 60]), 160, 130 ]
allEllipses = [ [ np.array([60,60]), 70., 110. ], [ np.array([60,60]), 100, 160 ] ]
Halfway = mpl.lines.Line2D((BIN_23_X,BIN_23_X), (0,125), color = '#808080', lw = 1.5, alpha = 0.8, zorder = 1)
Oval = mpl.patches.Ellipse( mainEllipse[0], mainEllipse[1], mainEllipse[2], lw = 3, edgecolor = '#808080', facecolor = '#808080', alpha = 0.2)
ax.add_patch(Oval)
ax.add_line(Halfway)
for ell in allEllipses:
arc = mpl.patches.Arc( ell[0] , ell[1], ell[2], angle = 0, color = '#808080', lw = 2, linestyle=':')
ax.add_patch( arc )
arc = mpl.patches.Arc( ell[0] * np.array([ -1, 1 ]), ell[1], ell[2], angle = 0, color = '#808080', lw = 2, linestyle=':')
ax.add_patch( arc )
binDict = dict()
for x,y in zip(X,Y):
binDict[( x,y)]=sort_into_bins( (x,y), mainEllipse, allEllipses )
rowEval=[]
for s in range(-3,4):
rowEval+=[[]]
for key, val in binDict.iteritems():
rowEval[ val + 3 ]+=[key]
for s in range(-3,4):
plt.scatter( *zip( *rowEval[ s + 3 ] ) )
plt.show()
显示
请注意,我使用关于x = 0的对称事实。如果椭圆相对于x偏移,则必须稍微修改代码。 另外请注意,提供椭圆的顺序很重要!