仅当比较JOIN(Presto / Amazon Athena)中的值时,SQL才会变慢

时间:2018-11-12 20:35:00

标签: sql amazon-athena presto

我有两个表,一个包含数据,另一个包含元数据。

主数据表包含一个地理空间坐标网格(最多数十亿行)。坐标被投影到特定的坐标系。模式的相关部分是:

 ------------------
| x     | smallint |
|------------------|
| y     | smallint |
|------------------|
| value | string   |
 ------------------

元数据表包含x,y坐标的相应纬度和经度值。模式的相关部分是:

 ----------------------------
| x         | smallint       |
|----------------------------|
| y         | smallint       |
|----------------------------|
| latitude  | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
 ----------------------------

这两个表上的JOIN可以让您知道特定X / Y坐标的实际纬度/经度。因为您不需要了解任何有关地图投影的信息,这将使在表上的查询更加容易。

从表中检索行的示例查询:

SELECT 
  main.x,
  main.y,
  latitude,
  longitude,
  value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
  main.x=1 AND main.y<=2

结果集如下:

 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345   | 54.321    | row1  |
|--------------------------------------|
| 1 | 2 | 12.345   | 98.765    | row2  |
 --------------------------------------

此查询仅需1-2秒,非常棒!

我的问题是,当我运行一个添加了比较纬度和经度的WHERE子句的查询时,该查询可以运行,但要花60秒钟以上才能运行...

例如:

SELECT 
  main.x,
  main.y,
  latitude,
  longitude,
  value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
  latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'

我知道我可能缺少一些有关SQL和JOIN的基本知识,这些都导致此查询变慢。在两个表上的独立查询都非常快,所以我知道我在这里做错了与JOIN有关的事情。

问题是,我如何做到这一点(看似简单的比较)只需几秒钟而不是60秒钟以上?

2 个答案:

答案 0 :(得分:0)

由于您正在使用地理空间数据,因此需要利用sql的地理功能来加快查询速度。 因此,首先创建一个Point列:

<div class="datepicker-days" style="display: none;">
   <table class=" table-condensed">
     <thead>
      <tr>
       <th class="prev" style="visibility: visible;">«</th>
       <th colspan="5" class="datepicker-switch">June 1993</th>
       <th class="next" style="visibility: visible;">»</th>
      </tr>
      <tr>
       <th class="dow">Su</th>
       <th class="dow">Mo</th>
       <th class="dow">Tu</th>
       <th class="dow">We</th>
       <th class="dow">Th</th>
       <th class="dow">Fr</th>
       <th class="dow">Sa</th>
      </tr>
     </thead>
     <tbody>
      <tr>
       <td class="old day">30</td>
       <td class="old day">31</td>
       <td class="day">1</td>
       <td class="day">2</td>
       <td class="day">3</td>
       <td class="day">4</td>
       ...
       <td class="day">29</td>
       <td class="day">30</td>
       <td class="new day">1</td>
       <td class="new day">2</td>
       <td class="new day">3</td>
      </tr>
      <tr>
       <td class="new day">4</td>
       <td class="new day">5</td>
       <td class="new day">6</td>
       <td class="new day">7</td>
       <td class="new day">8</td>
       <td class="new day">9</td>
       <td class="new day">10</td>
      </tr>
     </tbody>
     <tfoot>
      <tr>
       <th colspan="7" class="today" style="display: none;">Today</th>
      </tr>
      <tr>
       <th colspan="7" class="clear" style="display: none;">Clear</th>
      </tr>
     </tfoot>
    </table>
</div>
<div class="datepicker-months" style="display: block;">
   <table class="table-condensed">
     <thead>
      <tr>
       <th class="prev" style="visibility: visible;">«</th>
       <th colspan="5" class="datepicker-switch">1993</th>
       <th class="next" style="visibility: visible;">»</th>
      </tr>
     </thead>
     <tbody>
      <tr>
       <td colspan="7">
        <span class="month">Jan</span>
        <span class="month">Feb</span>
        <span class="month">Mar</span>
        <span class="month">Apr</span>
        <span class="month">May</span>
        <span class="month">Jun</span>
        <span class="month">Jul</span>
        <span class="month">Aug</span>
        <span class="month">Sep</span>
        <span class="month">Oct</span>
        <span class="month">Nov</span>
        <span class="month">Dec</span>
       </td>
      </tr>
     </tbody>
     <tfoot>
      <tr>
       <th colspan="7" class="today" style="display: none;">Today</th>
      </tr>
      <tr>
       <th colspan="7" class="clear" style="display: none;">Clear</th>
      </tr>
     </tfoot>
    </table>
</div>
<div class="datepicker-years" style="display: none;">
   <table class="table-condensed">
    <thead>
     <tr>
      <th class="prev" style="visibility: visible;">«</th>
      <th colspan="5" class="datepicker-switch">1990-1999</th>
      <th class="next" style="visibility: visible;">»</th>
     </tr>
    </thead>
    <tbody>
     <tr>
      <td colspan="7">
       <span class="year old">1989</span>
       <span class="year">1990</span>
       <span class="year">1991</span>
       <span class="year">1992</span>
       <span class="year">1993</span>
       <span class="year active">1994</span>
       <span class="year">1995</span>
       <span class="year">1996</span>
       <span class="year">1997</span>
       <span class="year">1998</span>
       <span class="year">1999</span>
       <span class="year new">2000</span>
      </td>
     </tr>
    </tbody>
    <tfoot>
      <tr>
       <th colspan="7" class="today" style="display: none;">Today</th>
     </tr>
     <tr>
      <th colspan="7" class="clear" style="display: none;">Clear</th>
     </tr>
    </tfoot>
   </table>
</div>

然后在此列上创建索引:

    create table test_geospatial(
    ...,
    ...,
    long decimal,
    lat decimal,
    point as geography::Point(long, lat, SRID) persisted
    )

答案 1 :(得分:-2)

添加JOIN时最糟糕的表现可能是没有索引可用于优化联接。

对于您来说,在main(x,y)上指定索引最有可能加快JOIN的速度。