我正在根据其长类型列之一过滤数据框,如下所示:
DataFrame jointsensorData2DoubleDF =//from external source
jointsensorData2DoubleDF
.filter(jointsensorData2DoubleDF.col("ts0").isNotNull())
.filter(jointsensorData2DoubleDF.col("ts0").notEqual(0L))
.persist(StorageLevel.MEMORY_AND_DISK_SER());
我想知道我是否可以在一个过滤器中执行3个过滤器,从而提高速度?
答案 0 :(得分:0)
我想知道我是否可以在一个过滤器中执行上面的3个过滤器
一步重写条件不会影响执行计划:
scala> Seq[java.lang.Long]().toDF("t0").filter($"t0".isNotNull).filter($"t0" =!= 0).explain(true)
== Parsed Logical Plan ==
'Filter NOT ('t0 = 0)
+- Filter isnotnull(t0#107L)
+- Project [value#105L AS t0#107L]
+- LocalRelation <empty>, [value#105L]
== Analyzed Logical Plan ==
t0: bigint
Filter NOT (t0#107L = cast(0 as bigint))
+- Filter isnotnull(t0#107L)
+- Project [value#105L AS t0#107L]
+- LocalRelation <empty>, [value#105L]
== Optimized Logical Plan ==
LocalRelation <empty>, [t0#107L]
== Physical Plan ==
LocalTableScan <empty>, [t0#107L]
连词:
scala> Seq[java.lang.Long]().toDF("t0").filter($"t0".isNotNull && $"t0" =!= 0).explain(true)
== Parsed Logical Plan ==
'Filter (isnotnull('t0) && NOT ('t0 = 0))
+- Project [value#114L AS t0#116L]
+- LocalRelation <empty>, [value#114L]
== Analyzed Logical Plan ==
t0: bigint
Filter (isnotnull(t0#116L) && NOT (t0#116L = cast(0 as bigint)))
+- Project [value#114L AS t0#116L]
+- LocalRelation <empty>, [value#114L]
== Optimized Logical Plan ==
LocalRelation <empty>, [t0#116L]
== Physical Plan ==
LocalTableScan <empty>, [t0#116L]