在以下代码中,我尝试使用Beam的dataFile
读取TextIO
中的CSV文件并过滤其标题行,但是此消息导致编译错误:
Error:(ROW, COLUMN) overloaded method value by with alternatives:
[T, PredicateT <: org.apache.beam.sdk.transforms.SerializableFunction[T,Boolean]](x$1: PredicateT)org.apache.beam.sdk.transforms.Filter[T] <and>
[T, PredicateT <: org.apache.beam.sdk.transforms.ProcessFunction[T,Boolean]](x$1: PredicateT)org.apache.beam.sdk.transforms.Filter[T]
cannot be applied to (org.apache.beam.sdk.transforms.SimpleFunction[String,Boolean])
.by(nonHeaderFilter))
代码:
val nonHeaderFilter: SimpleFunction[String, Boolean] = new SimpleFunction[String, Boolean]() {
override def apply(input: String): Boolean = {
input != MyClass.CsvHeader
}
}
def readDataFile(input: PBegin, dataFile: String): PCollection[String] = {
input
.apply("Read Data File", TextIO.read().from(dataFile))
.apply("Filter Header Line", Filter.by(nonHeaderFilter))
}
我认为问题与SerializableFunction
是ProcessFunction
和SimpleFunction
是SerializableFunction
的事实有关。某种程度上,这在Scala中无法正确处理。
有什么建议可以避免这个问题,或者我误会了什么?
编辑(解决方法):
要暂时解决此问题,以防万一其他人遇到此问题,我创建了一个静态Java方法来提供所需的过滤器:
import org.apache.beam.sdk.transforms.Filter;
public class BeamTransformProvider {
public static Filter<String> notEqualFilter(String value) {
return Filter.by(input -> !input.equals(value));
}
}
可用于以下用途:
def readDataFile(input: PBegin, dataFile: String): PCollection[String] = {
input
.apply("Read Data File", TextIO.read().from(dataFile))
.apply("Filter Header Line", BeamTransformProvider.notEqualFilter(MyClass.CsvHeader))
}