一步过滤数据框元素

时间:2017-09-15 06:38:01

标签: apache-spark

我正在根据其长类型列之一过滤数据框,如下所示:

DataFrame jointsensorData2DoubleDF =//from external source 
jointsensorData2DoubleDF
            .filter(jointsensorData2DoubleDF.col("ts0").isNotNull())
            .filter(jointsensorData2DoubleDF.col("ts0").notEqual(0L))
                .persist(StorageLevel.MEMORY_AND_DISK_SER());

我想知道我是否可以在一个过滤器中执行3个过滤器,从而提高速度?

1 个答案:

答案 0 :(得分:0)

  

我想知道我是否可以在一个过滤器中执行上面的3个过滤器

一步重写条件不会影响执行计划:

scala> Seq[java.lang.Long]().toDF("t0").filter($"t0".isNotNull).filter($"t0" =!= 0).explain(true)
== Parsed Logical Plan ==
'Filter NOT ('t0 = 0)
+- Filter isnotnull(t0#107L)
   +- Project [value#105L AS t0#107L]
      +- LocalRelation <empty>, [value#105L]

== Analyzed Logical Plan ==
t0: bigint
Filter NOT (t0#107L = cast(0 as bigint))
+- Filter isnotnull(t0#107L)
   +- Project [value#105L AS t0#107L]
      +- LocalRelation <empty>, [value#105L]

== Optimized Logical Plan ==
LocalRelation <empty>, [t0#107L]

== Physical Plan ==
LocalTableScan <empty>, [t0#107L]

连词:

scala> Seq[java.lang.Long]().toDF("t0").filter($"t0".isNotNull && $"t0" =!= 0).explain(true)
== Parsed Logical Plan ==
'Filter (isnotnull('t0) && NOT ('t0 = 0))
+- Project [value#114L AS t0#116L]
   +- LocalRelation <empty>, [value#114L]

== Analyzed Logical Plan ==
t0: bigint
Filter (isnotnull(t0#116L) && NOT (t0#116L = cast(0 as bigint)))
+- Project [value#114L AS t0#116L]
   +- LocalRelation <empty>, [value#114L]

== Optimized Logical Plan ==
LocalRelation <empty>, [t0#116L]

== Physical Plan ==
LocalTableScan <empty>, [t0#116L]