下推式过滤器未重置

时间:2018-08-26 11:59:27

标签: apache-spark

我正在开发自己的数据源阅读器,并实现了下推式过滤器。我在没有过滤器以重置先前定义的过滤器的情况下无法调用pushFilters()的情况下感到挣扎。这是一个Spark shell会话来演示此问题,并带有调试语句来显示调用调用。

1)初始未过滤的加载/显示

scala> val df = spark.read.format("MyDataSource").
    option("function", "testSpartan").
    option("loglevel", "debug").load
18/08/26 07:42:56.580 DEBUG dr: MyDataSourceReader()
18/08/26 07:42:56.580 DEBUG dr:   function: testSpartan
18/08/26 07:42:56.580 DEBUG dr:   loglevel: debug
df: org.apache.spark.sql.DataFrame = [jcolumn: bigint, ccolumn: string]

scala> df.show
18/08/26 07:43:33.659 DEBUG dr: pruneColums()
18/08/26 07:43:33.659 DEBUG dr:   StructField(jcolumn,LongType,false)
18/08/26 07:43:33.659 DEBUG dr:   StructField(ccolumn,StringType,false)
18/08/26 07:43:33.659 DEBUG dr: pushedFilters()
18/08/26 07:43:33.659 DEBUG dr: pushedFilters()
18/08/26 07:43:33.659 DEBUG dr: pushedFilters()
18/08/26 07:43:33.659 DEBUG dr: pushedFilters()
18/08/26 07:43:33.678 DEBUG dr: createBatchDataReaderFactories()
18/08/26 07:43:33.699 DEBUG dr: next()
18/08/26 07:43:33.701 DEBUG dr: get()
18/08/26 07:43:33.701 DEBUG dr: next()
+-------+-------+
|jcolumn|ccolumn|
+-------+-------+
|      0|      a|
|      1|      b|
|      2|      c|
|      3|      a|
|      4|      b|
|      5|      c|
|      6|      a|
|      7|      b|
|      8|      c|
|      9|      a|
+-------+-------+

2)使用简单的过滤器进行调用。请注意对pushFilters()的调用

scala> df.filter("jcolumn<2").show
18/08/26 07:45:42.500 DEBUG dr: pushedFilters()
18/08/26 07:45:42.500 DEBUG dr: pushedFilters()
18/08/26 07:45:42.500 DEBUG dr: pushedFilters()
18/08/26 07:45:42.500 DEBUG dr: pushedFilters()
18/08/26 07:45:42.501 DEBUG dr: pushFilters()
18/08/26 07:45:42.501 DEBUG dr:   LessThan(jcolumn,2)
18/08/26 07:45:42.501 DEBUG dr: pruneColums()
18/08/26 07:45:42.501 DEBUG dr:   StructField(jcolumn,LongType,false)
18/08/26 07:45:42.501 DEBUG dr:   StructField(ccolumn,StringType,false)
18/08/26 07:45:42.501 DEBUG dr: pushedFilters()
18/08/26 07:45:42.501 DEBUG dr: pushedFilters()
18/08/26 07:45:42.501 DEBUG dr: pushedFilters()
18/08/26 07:45:42.501 DEBUG dr: pushedFilters()
18/08/26 07:45:42.512 DEBUG dr: createBatchDataReaderFactories()
18/08/26 07:45:42.529 DEBUG dr: next()
18/08/26 07:45:42.532 DEBUG dr: get()
18/08/26 07:45:42.532 DEBUG dr: next()
+-------+-------+
|jcolumn|ccolumn|
+-------+-------+
|      0|      a|
|      1|      b|
+-------+-------+

3)随后的呼叫,没有过滤器。您会看到,我没有用空的Filter数组调用pushFilters()。我不确定要重置支持的过滤器应该得到什么“信号”

scala> df.show
18/08/26 07:46:21.442 DEBUG dr: pruneColums()
18/08/26 07:46:21.442 DEBUG dr:   StructField(jcolumn,LongType,false)
18/08/26 07:46:21.442 DEBUG dr:   StructField(ccolumn,StringType,false)
18/08/26 07:46:21.443 DEBUG dr: pushedFilters()
18/08/26 07:46:21.443 DEBUG dr: pushedFilters()
18/08/26 07:46:21.443 DEBUG dr: pushedFilters()
18/08/26 07:46:21.443 DEBUG dr: pushedFilters()
18/08/26 07:46:21.452 DEBUG dr: createBatchDataReaderFactories()
18/08/26 07:46:21.468 DEBUG dr: next()
18/08/26 07:46:21.470 DEBUG dr: get()
18/08/26 07:46:21.471 DEBUG dr: next()
+-------+-------+
|jcolumn|ccolumn|
+-------+-------+
|      0|      a|
|      1|      b|
+-------+-------+

0 个答案:

没有答案