如何在Big查询中使用ST_CONTAINS加入地理列

时间:2019-05-31 16:25:39

标签: google-bigquery gis

我有一个BigQuery表,其地址包括纬度/经度和其他BQ表,这些表具有从普查shapefile导入的有效geom定义。对于地址表中的每一行,我正在尝试查找包含它的geom行。

以下查询是我查找的lat / lng个人作品正常吗?

SELECT SLDLST FROM `geographies.tl_2018_sldl_*` sldl WHERE ST_CONTAINS(sldl.geom, ST_GEOGPOINT(-95.221080, 38.974500));

但是当我尝试将抽象抽象为连接时

SELECT 
  address_id,
  SLDLST
FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample` ssgolden
LEFT JOIN `geographies.tl_2018_sldl_*` sldl ON ST_CONTAINS(sldl.geom, ST_GEOGPOINT(ssgolden.longitude, ssgolden.latitude));

我得到一个错误: “如果没有连接两端的字段相等的条件,则不能使用LEFT OUTER JOIN。”

如何重组我的联接查询,以便能够提取每个地址的匹配地理位置?

2 个答案:

答案 0 :(得分:2)

以下是用于BigQuery标准SQL

如果要在输出中保留不匹配的地址-可以在下面使用

#standardSQL
WITH matched_addresses AS (
  SELECT 
    address_id,
    SLDLST
  FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample` ssgolden
  JOIN `geographies.tl_2018_sldl_X` sldl 
  ON ST_CONTAINS(sldl.geom, ST_GEOGPOINT(ssgolden.longitude, ssgolden.latitude)) 
)
SELECT * FROM matched_addresses UNION ALL 
SELECT address_id, NULL 
FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample`
WHERE NOT address_id IN (SELECT address_id FROM matched_addresses)   

但如果您只对匹配感兴趣,请在一个以下使用

#standardSQL
WITH matched_addresses AS (
  SELECT 
    address_id,
    SLDLST
  FROM `launchpad-239920.address_standardization.temp_delete_geo_match_sample` ssgolden
  JOIN `geographies.tl_2018_sldl_X` sldl 
  ON ST_CONTAINS(sldl.geom, ST_GEOGPOINT(ssgolden.longitude, ssgolden.latitude)) 
)
SELECT * FROM matched_addresses  

答案 1 :(得分:0)

一种自动处理不匹配地址的解决方案,而无需Mikhail建议的UNION_ALL(这样可以提高性能):

#standardSQL
WITH addresses AS (
  SELECT *, GENERATE_UUID() uuid
  FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2015`  ssgolden
  WHERE DATE(ssgolden.pickup_datetime) = '2015-10-07'
), matched_addresses AS (
  SELECT ARRAY_AGG(
      IF(
        ST_CONTAINS(sldl.zone_geom, SAFE.ST_GEOGPOINT(ssgolden.pickup_longitude, ssgolden.pickup_latitude))
        , sldl.zone_name, null)
      IGNORE NULLs LIMIT 1)[OFFSET(0)] zone_name
  FROM addresses  ssgolden
  CROSS JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom`  sldl 
  GROUP BY uuid
)

SELECT zone_name, COUNT(*) c
FROM matched_addresses 
GROUP BY 1
ORDER BY c DESC

enter image description here

现在,让我们针对一大组几何图形(74,133个-整个美国以及更多-回应Michael的评论)测试性能:

#standardSQL
WITH addresses AS (
  SELECT *, GENERATE_UUID() uuid
  FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2015`  ssgolden
  WHERE DATE(ssgolden.pickup_datetime) = '2015-10-07'
), matched_addresses AS (
  SELECT ARRAY_AGG(
      IF(
        ST_CONTAINS(sldl.tract_geom, SAFE.ST_GEOGPOINT(ssgolden.pickup_longitude, ssgolden.pickup_latitude))
        , FORMAT('%s %s', sldl._table_suffix,sldl.lsad_name), null)
      IGNORE NULLs LIMIT 1)[OFFSET(0)] zone_name
  FROM addresses  ssgolden
  CROSS JOIN `bigquery-public-data.geo_census_tracts.census_tracts_*`   sldl 
  GROUP BY uuid
)

SELECT zone_name, COUNT(*) c
FROM matched_addresses 
GROUP BY 1
ORDER BY c DESC