我正在尝试将来自不同Google Pubsub主题的所有事件存档到Google Cloud Storage中。我目前有10个主题,并且发展很快。
我之所以选择Google Dataflow,是因为它具有可扩展性以及与其他Google服务的集成。
目前,我有一个数据流管道,可以使用所有主题。
当我写到单个输出位置时,我可以使用窗口化,并且可以成功写出。
我现在正尝试根据消息来自的主题将消息写到其他子文件夹中(该信息在消息中可用)。
当我调试管道时,它正确进入了getDestination
方法,但是似乎从未进入getFilenamePolicy
,因此也从未出现在我的Google Cloud Storage Bucket中。
我想念什么吗?我应该采用其他方法吗?
我意识到要解决我的问题,每个主题可以有一个单独的数据流,但是我认为很难维护主题的数量。
管道代码:
PCollectionList.of(pcollections).apply(Flatten.pCollections())
.apply(
options.getWindowDuration() + " Window",
Window.into(FixedWindows.of(DurationUtils.parseDuration(options.getWindowDuration())))
// Apply windowed file writes. Use a NestedValueProvider because the filename
// policy requires a resourceId generated from the input value at runtime.
.apply(
"Write File(s)",
TextIO.write().withWindowedWrites()
.withNumShards(options.getNumShards())
.to(
new DynamicWindowedFilenamePolicy(
options.getOutputDirectory(),
options.getOutputFilenamePrefix(),
options.getOutputShardTemplate(),
options.getOutputFilenameSuffix()))
.withTempDirectory(NestedValueProvider.of(
options.getOutputDirectory(),
(SerializableFunction<String, ResourceId>) input ->
FileBasedSink.convertToFileResourceIfPossible(input))));
DynamicWindowedFilenamePolicy类:
public class DynamicWindowedFilenamePolicy extends FileBasedSink.DynamicDestinations<String,String,String> {
private final ValueProvider<String> outputDirectory;
private final ValueProvider<String> outputFilenamePrefix;
private final ValueProvider<String> suffix;
private final ValueProvider<String> shardTemplate;
public DynamicWindowedFilenamePolicy(
ValueProvider<String> outputDirectory,
ValueProvider<String> outputFilenamePrefix,
ValueProvider<String> shardTemplate,
ValueProvider<String> suffix) {
this.outputDirectory = outputDirectory;
this.outputFilenamePrefix = outputFilenamePrefix;
this.shardTemplate = shardTemplate;
this.suffix = suffix;
}
public ResourceId windowedFilename(
int shardNumber,
int numShards,
BoundedWindow window,
PaneInfo paneInfo,
OutputFileHints outputFileHints) {
...
}
private ResourceId resolveWithDateTemplates(
ValueProvider<String> outputDirectoryStr, BoundedWindow window) {
...
}
@Override
public String formatRecord(String record) {
return record;
}
@Override
public String getDestination(String element) {
return "folder-determined-from-element";
}
@Override
public String getDefaultDestination() {
return "default-desination";
}
@Override
public FilenamePolicy getFilenamePolicy(String destination) {
return new FilenamePolicy() {
@Override
public ResourceId windowedFilename(int shardNumber, int numShards, BoundedWindow window, PaneInfo paneInfo, OutputFileHints outputFileHints) {
return windowedFilename(shardNumber, numShards, window, paneInfo, outputFileHints);
}
@Nullable
@Override
public ResourceId unwindowedFilename(int shardNumber, int numShards, OutputFileHints outputFileHints) {
return unwindowedFilename(shardNumber,numShards,outputFileHints);
}
};
}
}