我需要从Glue Job(spark.sql)向AWS上的Athena执行SQL请求。
我的查询非常简单
df = spark.sql("select * from hashes
where year='2109' and month='10' and day='08'
and myhashes in (%s) order by timestamp desc" % (
",".join( "'"+str(x)+"'" for x in myhashes )) )
这段代码产生一个类似
的字符串select * from hashes where year='2019'
and month='10' and day='08'
and myhashes in (
'06SN931',
'06SN931',
'06SP317',
...........
'86X0297'
)
它在雅典娜非常好用
但是,如果我运行Glue Job火花,似乎会将查询从IN转换为OR语法,例如
其中.... day = '08'和(myhashes ='06XH8V3'或myhashes ='06X68P4'或my .....)并产生错误。
Here the exception
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:759)
... 64 more
Caused by: MetaException(message:1 validation error detected: Value 'year = '2019' and month = '10' and day = '08' and (myhashes = '06XH58V3' or myhashes = '06X658P4' or myhashes = '45X42051' or myhashes = '15S03560' or myhashes = '10S2868' or myhashes = '416S2661' or myhashes = 'dDSD' or myhashes = 'DSSD' or myhashes = '13XE639' or myhashes = '06X668N7' or myhashes = '06X364T2' or
.......
myhashes = '96S652207' or myhashes = '06X26365M' or myhashes = '10X560c89' or myhashes = '06X01N8' or )'
at 'expression' failed to satisfy constraint: Member must have length less than or equal to 2048 (Service: AWSGlue; Status Code: 400; Error Code: ValidationException; Request ID: 83f7bc7b-0d10-11ea-9a8c-fdfadfa2a22b))
at com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.getHiveException(CatalogToHiveConverter.java:100)
at com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.wrapInHiveException(CatalogToHiveConverter.java:88)
at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getCatalogPartitions(GlueMetastoreClientDelegate.java:948)
at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getPartitions(GlueMetastoreClientDelegate.java:911)
at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.listPartitionsByFilter(AWSCatalogMetastoreClient.java:1179)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2255)
... 69 more
End of LogType:stdout
是否可以禁用SQL的Spark内部优化?
答案 0 :(得分:0)
错误消息暗示您的查询太长(超过2048个字符)。 AWS Athena和AWS Glue具有不同的约束。
如果可能,尝试通过将与包含 render(){
let textInput;
return (
<div className="App">
<input type="text" ref={(el) => {textInput = el}} />
<button onClick={() => {this.addTodo(textInput.value)}}>Click to Add a Todo</button>
{this.state.todos.map(todo => <Todo key={todo.id} name={todo.name} completed={todo.completed}/>)}
</div>
);
}
值的表进行联接来过滤表(“哈希”),而不要使用SQL myhashes
,只要要比较的元素数量变大。