我想从spark程序中过滤mongo文档的objectId。我尝试过以下方法:
case class _id(oid: String)
val str_start: _id = _id((start.getMillis() / 1000).toHexString + "0000000000000000")
val str_end: _id = _id((end.getMillis() / 1000).toHexString + "0000000000000000")
val filteredDF = df.filter(
$"_timestamp".isNotNull
.and($"_timestamp".between(new Timestamp(start.getMillis()), new Timestamp(end.getMillis())))
.and($"_id").between(str_start, str_end)
或
val str_start = (start.getMillis() / 1000).toHexString + "0000000000000000"
val str_end = (end.getMillis() / 1000).toHexString + "0000000000000000"
val filteredDF = df.filter(
$"_timestamp".isNotNull
.and($"_timestamp".between(new Timestamp(start.getMillis()), new Timestamp(end.getMillis())))
.and($"_id.oid").between(str_start, str_end)
两者都给我分析错误:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot
resolve '(((`_timestamp` IS NOT NULL) AND ((`_timestamp` >= TIMESTAMP('2017-
07-31 00:22:00.0')) AND (`_timestamp` <= TIMESTAMP('2017-08-01
00:22:00.0')))) AND `_id`)' due to data type mismatch: differing types in
'(((`_timestamp` IS NOT NULL) AND ((`_timestamp` >= TIMESTAMP('2017-07-31
00:22:00.0')) AND (`_timestamp` <= TIMESTAMP('2017-08-01 00:22:00.0')))) AND
`_id`)' (boolean and struct<oid:string>).;;
'Filter (((((isnotnull(_timestamp#40) && ((_timestamp#40 >=
1501449720000000) && (_timestamp#40 <= 1501536120000000))) && _id#38) >=
597e4df80000000000000000) && (((isnotnull(_timestamp#40) && ((_timestamp#40
>= 1501449720000000) && (_timestamp#40 <= 1501536120000000))) && _id#38) <=
597f9f780000000000000000))
如何查询oid?
由于 尼尔
答案 0 :(得分:0)
我认为你是错位的括号:应该像
and($"_id.oid" between(str_start, str_end) )
(这就是您收到错误消息的原因:
(boolean and struct<oid:string>)