在Scala中编写Apache Beam过滤器

时间:2019-11-21 18:41:21

标签: scala apache-beam

我正在使用Scala编写一个用Java编写的Apache Beam项目,并且无法识别使用Filter.by功能所需的语法。这是我尝试过的例子,

    class Test extends SerializableFunction[String, Boolean] {
      def apply(m: String): Boolean = true
    }

    val pipeline = Pipeline.create();
    pipeline.apply(
      KafkaIO.read[String,String]()
        .withBootstrapServers("localhost:9092")
        .withTopic("test-topic")
        .withKeyDeserializer(classOf[StringDeserializer])
        .withValueDeserializer(classOf[StringDeserializer])
        .withoutMetadata()
    )
    .apply(Values.create())
    .apply(Filter.by((m: String) => true))
//And I've tried this
    .apply(Filter.by(new Test()))

这给了我以下错误,

[error] example.scala:61:19: overloaded method value by with alternatives:
[error]   [T, PredicateT <: org.apache.beam.sdk.transforms.SerializableFunction[T,Boolean]](predicate: PredicateT)org.apache.beam.sdk.transforms.Filter[T] <and>
[error]   [T, PredicateT <: org.apache.beam.sdk.transforms.ProcessFunction[T,Boolean]](predicate: PredicateT)org.apache.beam.sdk.transforms.Filter[T]
[error]  cannot be applied to (com.example.Test)
[error]     .apply(Filter.by(new Test()))
[error]                   ^
[error] one error found

Filter.by的文档在https://beam.apache.org/releases/javadoc/2.2.0/org/apache/beam/sdk/transforms/Filter.html#by-PredicateT-

1 个答案:

答案 0 :(得分:0)

首先,您可能对scio感兴趣,它可以更清晰地与Scala结合使用。

否则,我通过在lambda上明确指定Filter.by类型(使用Beam 2.16.0测试),成功使用Java SDK创建了ProcessFunction

// Using test pipeline outside of a JUnit @Rule
val pipeline = TestPipeline.create
pipeline.enableAbandonedNodeEnforcement(false)

// Applying a filter.
val predicate: ProcessFunction[String, java.lang.Boolean] = m => m.length == 3
val output = pipeline.apply(Create.of("one", "two", "three"))
  .apply(Filter.by(predicate))
PAssert.that(output).containsInAnyOrder("one", "two")

// Run the test.
pipeline.run();

(请注意,返回类型为java.lang.Boolean不是 scala.Boolean。)