我有一个MySQL数据集,包含纬度,经度和值。我试图总结其纬度和经度坐标在其他纬度和经度坐标的给定半径内的值(让我们称之为“焦点”)。最棘手的是我试图从重叠区域中分离出不同的坐标 - 例如,半径1与半径2重叠的位置。
有半径的每个焦点都有多个半径的“区域”,因此对于任何给定的纬度/经度坐标集,可以有很多要总结的东西。我已经设法将一个主要工作的查询放在一起,虽然它有点慢:
Select
Sum(If(`zone`='z0_0x1_0',`value`,0)) as `z0_0x1_0`,
Sum(If(`zone`='z0_0x1_1',`value`,0)) as `z0_0x1_1`,
Sum(If(`zone`='z0_0x1_2',`value`,0)) as `z0_0x1_2`,
Sum(If(`zone`='z0_0x1_3',`value`,0)) as `z0_0x1_3`,
Sum(If(`zone`='z0_1x1_0',`value`,0)) as `z0_1x1_0`,
Sum(If(`zone`='z0_1x1_1',`value`,0)) as `z0_1x1_1`,
Sum(If(`zone`='z0_1x1_2',`value`,0)) as `z0_1x1_2`,
Sum(If(`zone`='z0_2x1_0',`value`,0)) as `z0_2x1_0`,
Sum(If(`zone`='z0_2x1_1',`value`,0)) as `z0_2x1_1`,
Sum(If(`zone`='z0_3x1_0',`value`,0)) as `z0_3x1_0`,
Sum(If(`zone`='z0_3x1_1',`value`,0)) as `z0_3x1_1`,
Sum(If(`zone`='z0_0',`value`,0)) as `z0_0`,
Sum(If(`zone`='z0_1',`value`,0)) as `z0_1`,
Sum(If(`zone`='z0_2',`value`,0)) as `z0_2`,
Sum(If(`zone`='z0_3',`value`,0)) as `z0_3`,
Sum(If(`zone`='z1_0',`value`,0)) as `z1_0`,
Sum(If(`zone`='z1_1',`value`,0)) as `z1_1`,
Sum(If(`zone`='z1_2',`value`,0)) as `z1_2`,
Sum(If(`zone`='z1_3',`value`,0)) as `z1_3`
From
(Select `lat`, `lng`, `value`,
Case
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_0x1_0'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_0x1_1'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 1.3333495959677 And 2.1278369006061)) Then 'z0_0x1_2'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324) And (`dist_1` Between 0 And 1.3333495959677)) Then 'z0_0x1_3'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_1x1_0'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_1x1_1'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095) And (`dist_1` Between 1.3333495959677 And 2.1278369006061)) Then 'z0_1x1_2'
When ((`dist_0` Between 1.0573158612197 And 1.68658498678) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_2x1_0'
When ((`dist_0` Between 1.0573158612197 And 1.68658498678) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_2x1_1'
When ((`dist_0` Between 0 And 1.0573158612197) And (`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z0_3x1_0'
When ((`dist_0` Between 0 And 1.0573158612197) And (`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z0_3x1_1'
When ((`dist_0` Between 2.8723597844095 And 4.3343662110324)) Then 'z0_0'
When ((`dist_0` Between 1.68658498678 And 2.8723597844095)) Then 'z0_1'
When ((`dist_0` Between 1.0573158612197 And 1.68658498678)) Then 'z0_2'
When ((`dist_0` Between 0 And 1.0573158612197)) Then 'z0_3'
When ((`dist_1` Between 3.6260179152491 And 5.4681062617155)) Then 'z1_0'
When ((`dist_1` Between 2.1278369006061 And 3.6260179152491)) Then 'z1_1'
When ((`dist_1` Between 1.3333495959677 And 2.1278369006061)) Then 'z1_2'
When ((`dist_1` Between 0 And 1.3333495959677)) Then 'z1_3'
End As `zone`
From
(Select `lat`, `lng`, `value`,
(acos(0.65292272498833*sin(radians(`lat`)) + 0.75742452772129*cos(radians(`lat`))*cos(radians(`lng`)-(-1.2910922519714))) * 6371) as `dist_0`,
(acos(0.65251345816785*sin(radians(`lat`)) + 0.75777713538338*cos(radians(`lat`))*cos(radians(`lng`)-(-1.2916315412569))) * 6371) as `dist_1`
From `pop`
Where
((`lat` Between 40.714353892125 And 40.810300107875) And (`lng` Between -74.037474145971 And -73.910799854029)) Or
((`lat` Between 40.673205922895 And 40.789544077105) And (`lng` Between -74.081798776797 And -73.928273223203))
)
As FirstCut
)
As Zonecut
这是事物的逻辑:
首先,它围绕每个焦点的最大半径抓取边界框。 (这是FirstCut查询。)这会使我们正在查看的数据点数减少几个数量级。
然后它处理所有数据,并获得每个数据点与焦点的距离(在本例中为dist_0
和dist_1
,但可以有任意数量的焦点 - 我在这个例子中使用了两个来展示它是如何工作的。这是大圆距离的Haversine公式。
然后启动Case语句,为每个坐标指定一个“区域”的成员,这些坐标从最复杂到最复杂的处理。区域代码仅表示“区域X,半径Y” - 因此“z0_1”表示“区域0,半径1”。如果存在“x”,则表示它是多个区域的交集。此“区域代码”仅指定为字符串。
最后,通过分配区域名称然后分配Sum(If())语句,区域代码用于汇总所有内容。 (无论出于何种原因,If()似乎比Case()稍微快一点。)
哪些输出到我的脚本(PHP)的区域和总和列表。现在很明显,整个过程都是程序生成的,因为你必须提前计算实际上会有“命中”的所有可能区域,并且这些区域都是作为预处理完成的,以避免在SQL中执行。
有更聪明的方法吗?我为它们分配一个字符串,然后将该字符串过滤到字段中的位...它看起来很hacky,不是很优雅。但我找不到一种更好的方法来将它们分类到一个大的Case语句中的字段(它看起来比许多Case语句快得多)。
对此的任何和所有反馈都将不胜感激。 MySQL表格庞大(数百万行)并被索引到所有神圣的地狱。运行上面的查询需要大约0.6秒,这也不算太糟糕,但随着更多的焦点被添加,查询开始花费更长的时间,而我只是想在这个阶段思考我的方式通过SQL逻辑。谢谢。
答案 0 :(得分:1)
我没有仔细检查,但似乎这可以缩短那个大CASE
一些:
CONCAT(
( CASE
WHEN (dist_0 ... ) THEN 'z0_0'
WHEN (dist_0 ... ) THEN 'z0_1'
...
ELSE '' ),
( CASE
WHEN (dist_1 ... ) THEN 'z1_0'
WHEN (dist_1 ... ) THEN 'z1_1'
...
ELSE '' ) ) AS zone