Presto不匹配的域类型:日期与整数

时间:2018-11-26 05:32:37

标签: amazon-s3 presto

当我尝试查询表中的日期类型列时遇到了这个错误

select d from aggregatedb.room_nights where d = cast(current_date as date) limit 10;

错误

com.facebook.presto.spi.PrestoException:
Error opening Hive split s3://oyo/room_nights/year=2018/month=11/part-00016-59e12a09-bed9-4234-88d0-098df6d926bb-c000 (offset=0, length=1446668): Mismatched Domain types: date vs integer
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:220  
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:115)    
at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:160)    
at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:93)     
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)    
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)  
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:216)   
at com.facebook.presto.operator.Driver.processInternal(Driver.java:373)     
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)     
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)     
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)  
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)   
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)   
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:495)    
at com.facebook.presto.$gen.Presto_0_208_x_0_10____20181124_123813_1.run(Unknown Source)    
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
at java.lang.Thread.run(Thread.java:748) 
Caused by: java.lang.IllegalArgumentException: Mismatched Domain types: date vs integer     
at com.facebook.presto.spi.predicate.Domain.checkCompatibility(Domain.java:229)     
at com.facebook.presto.spi.predicate.Domain.intersect(Domain.java:184)  
at com.facebook.presto.spi.predicate.TupleDomain.intersect(TupleDomain.java:197)    
at com.facebook.presto.spi.predicate.TupleDomain.overlaps(TupleDomain.java:300)     
at com.facebook.presto.hive.parquet.predicate.TupleDomainParquetPredicate.matches(TupleDomainParquetPredicate.java:94)  
at com.facebook.presto.hive.parquet.predicate.ParquetPredicateUtils.predicateMatches(ParquetPredicateUtils.java:116)    
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.lambda$createParquetPageSource$3(ParquetPageSourceFactory.java:181)    
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)    
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)   
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)    
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)     
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)   
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)    
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)   
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:182)

显示创建表命令返回字段'd'的类型作为日期

CREATE TABLE hive.aggregatedb.room_nights (    other columns,    d date,    other columns,    year integer,    month integer ) WITH (    external_location = 's3://oyo/room_nights',    format = 'PARQUET',    partitioned_by = ARRAY['year','month'] )

此处是上述查询的查询计划:-

EXPLAIN select d from aggregatedb.room_nights where d = cast(current_date as date) limit 10;

- Output[d] => [d:date]                                                                                                          
         Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                             
     - Limit[10] => [d:date]                                                                                                      
             Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                         
         - LocalExchange[SINGLE] () => d:date                                                                                     
                 Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                     
             - RemoteExchange[GATHER] => d:date                                                                                   
                     Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                 
                 - LimitPartial[10] => [d:date]                                                                                   
                         Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 0.00}                                              
                     - ScanFilter[table = hive:aggregatedb:room_nights, originalConstraint = ("d" = DATE '2018-11-26'), filterPred
                             Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network:
                             LAYOUT: aggregatedb.room_nights                                                                      
                             d := HiveColumnHandle{name=d, hiveType=date, hiveColumnIndex=23, columnType=REGULAR}                 
                                 :: [[2018-11-26]]                                                                                
                             HiveColumnHandle{name=year, hiveType=int, hiveColumnIndex=-1, columnType=PARTITION_KEY}              
                                 :: [[2016], [2017], [2018], [2019]]                                                              
                             HiveColumnHandle{name=month, hiveType=int, hiveColumnIndex=-1, columnType=PARTITION_KEY}             
                                 :: [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]]

但是如果我将'd'列转换为时间戳,我可以运行查询

EXPLAIN select d from aggregatedb.room_nights where cast(d as timestamp) = cast(current_date as timestamp) limit 10;

- Output[d] => [d:date]                                                                                                          
         Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                             
     - Limit[10] => [d:date]                                                                                                      
             Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                         
         - LocalExchange[SINGLE] () => d:date                                                                                     
                 Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                     
             - RemoteExchange[GATHER] => d:date                                                                                   
                     Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 50.00}                                                 
                 - LimitPartial[10] => [d:date]                                                                                   
                         Cost: {rows: 10 (50B), cpu: ?, memory: 0.00, network: 0.00}                                              
                     - ScanFilter[table = hive:aggregatedb:room_nights, originalConstraint = (CAST("d" AS timestamp) = "$literal$t
                             Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network:
                             LAYOUT: aggregatedb.room_nights                                                                      
                             d := HiveColumnHandle{name=d, hiveType=date, hiveColumnIndex=23, columnType=REGULAR}                 
                             HiveColumnHandle{name=year, hiveType=int, hiveColumnIndex=-1, columnType=PARTITION_KEY}              
                                 :: [[2016], [2017], [2018], [2019]]                                                              
                             HiveColumnHandle{name=month, hiveType=int, hiveColumnIndex=-1, columnType=PARTITION_KEY}             
                                 :: [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]] 

1 个答案:

答案 0 :(得分:0)