ESRI Hive ST_Contains does not work properly

时间:2016-02-12 20:17:10

标签: hadoop sdk hive spatial esri

Trying this with the JARs I could find (not sure they are the best choice for this, I needed to use ESRI and do it in Hive):

SELECT * FROM #temp

Running the following query:

ADD JAR /home/user/lib/esri-geometry-api-1.2.1.jar;
ADD JAR /home/user/lib/spatial-sdk-hive-1.1.1-SNAPSHOT.jar;
ADD JAR /home/user/lib/esri-geometry-api.jar;
ADD JAR /home/user/lib/spatial-sdk-hadoop.jar;

CREATE TEMPORARY FUNCTION ST_Polygon AS 'com.esri.hadoop.hive.ST_Polygon';
CREATE TEMPORARY FUNCTION ST_Point AS 'com.esri.hadoop.hive.ST_Point';
CREATE TEMPORARY FUNCTION ST_Contains AS 'com.esri.hadoop.hive.ST_Contains';
CREATE TEMPORARY FUNCTION ST_Geometry AS 'com.esri.hadoop.hive.ST_Geometry';

Where polygon SELECT IF(1=1, 40.7484445, 0) AS latitude, IF(1=1,-73.9878531, 0) AS longitude FROM any_table WHERE NOT ST_Contains( ST_POLYGON('POLYGON((170.0 20.0, -170.0 73.0, -50.0 20.0, -50.0 73.0))'), ST_Point(CAST(longitude AS DOUBLE), CAST(latitude AS DOUBLE))) LIMIT 1; is a roughly USA box, given coordinates 'POLYGON((170.0 20.0, -170.0 73.0, -50.0 20.0, -50.0 73.0))' belong to New York. The result is supposed to be empty with WHERE NOT, but it still returns these coordinates. It does not filter as it supposed to.

What I am doing wrong?

2 个答案:

答案 0 :(得分:2)

只应加载几何API的一个版本。同样,只有spatial-sdk-hadoop或者spatial-sdk-json和spatial-sdk-hive中的一个。

WKT多边形用一个重复起始顶点的末端顶点关闭。

多边形需要由周边按周围顺序指定,而不是之字形顺序。

Geometry API是平面的,不支持环绕国际日期变换线。

可能是-170而不是+170纬度。

wget https://github.com/Esri/spatial-framework-for-hadoop/releases/download/v1.1/spatial-sdk-hive-1.1.jar \
  https://github.com/Esri/spatial-framework-for-hadoop/releases/download/v1.1/spatial-sdk-json-1.1.jar \\   https://github.com/Esri/geometry-api-java/releases/download/v1.2.1/esri-geometry-api-1.2.1.jar

hive -S
添加jar /pathto/esri-geometry-api-1.2.1.jar
  /pathto/spatial-sdk-json-1.1.jar
  /pathto/spatial-sdk-hive-1.1.jar;
创建临时函数ST_AsBinary为'com.esri.hadoop.hive.ST_AsBinary';
- ......

选择ST_Contains(ST_Polygon(1,1,1,4,4,4,4,1),ST_Point(2,3));
  真的 选择ST_Contains(ST_Polygon('POLYGON((11,1,4,4,4,11,11))'),ST_Point(2,3));
  真的 选择ST_Contains(ST_POLYGON('POLYGON(( - 170.0 20.0,-170.0 73.0,-50.0 20.0,-50.0 73.0,-170.0 20.0))'),ST_Point(-73.9878531,40.7484445));
  真的 选择不ST_Contains(ST_POLYGON('POLYGON(( - 170.0 20.0,-170.0 73.0,-50.0 20.0,-50.0 73.0,-170.0 20.0))'),ST_Point(-73.9878531,40.7484445));
  假

答案 1 :(得分:0)

add jar /home/..../esri-geometry-api-1.2.1.jar;
add jar /home/..../spatial-sdk-json-1.2.0.jar;
add jar /home/..../spatial-sdk-hive-1.2.0.jar;
add jar /home/..../spatial-sdk-hadoop.jar;

create temporary function ST_AsBinary as 'com.esri.hadoop.hive.ST_AsBinary';

CREATE TEMPORARY FUNCTION ST_Polygon AS 'com.esri.hadoop.hive.ST_Polygon';

CREATE TEMPORARY FUNCTION ST_Point AS 'com.esri.hadoop.hive.ST_Point';

CREATE TEMPORARY FUNCTION ST_Contains AS 'com.esri.hadoop.hive.ST_Contains';

CREATE TEMPORARY FUNCTION ST_Geometry AS 'com.esri.hadoop.hive.ST_Geometry';

A) load table from geojson data to hive:

CREATE TABLE default.lim_xxx_pais
(
NOM_PLAN string,
 NMO_PLAN  string,
 APROXIMADO string,
 ID1 string,
 BoundaryShape binary
)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.GeoJsonSerDe'              
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedGeoJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

B)

LOAD DATA INPATH '/user/.../lim_xxx_pais.geojson' OVERWRITE INTO TABLE lim_xxx_pais

C)

select NOM_PLAN, NMO_PLAN,APROXIMADO,ID1 from default.lim_centrales_pais aa
where ST_Contains(aa.boundaryshape, ST_POINT(-72.08726603,-36.62627804) )
;