我正在使用Apache Beam持续观看一个桶:
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
p.apply(TextIO.read()
.from("gs://myshard/sourcerepo/*")
.watchForNewFiles(
// Check for new files every 30 seconds
Duration.standardSeconds(30),
// Never stop checking for new files
Watch.Growth.<String>never())
)
.apply(FlatMapElements
.into(TypeDescriptors.strings())
.via((String word) -> Arrays.asList(word.split("[^\\p{L}]+"))))
.apply(Filter.by((String word) -> !word.isEmpty()))
.apply(Count.perElement())
.apply(MapElements
.into(TypeDescriptors.strings())
.via((KV<String, Long> wordCount) -> wordCount.getKey() + ": " + wordCount.getValue()))
.apply(TextIO.write().to("gs://myshard/destrepo/wordcounts.txt"));
p.run().waitUntilFinish();
}
问题是管道在第一次转换(应用)之后没有进入任何步骤,这正在观看文件夹。也许是微不足道的,但我真的很感激帮助。