根据日志时间

时间:2017-09-27 07:32:22

标签: java google-cloud-storage google-cloud-dataflow google-cloud-pubsub

我正在解析来自pubsub的日志,目的是将这些日志放在自定义位置的每小时文件中,这也是基于日志时间戳(pubsub日志中的字段)。

文件应该获取特定时间的所有数据。文件应该每小时继续附加。 例如GS://bucket/applog/2017-09-27/application1/app-2017-09-27-11H.log

pushFilePColl.apply(Window.into(new FileTextIOWindowFn())) .apply("FileTO to LOG TextIO", ParDo.of(new TextIOWriteDoFn())) .apply(TextIO.write().to(pipelineOptions.getFileStorage‌​Bucket()).withWindow‌​edWrites() .withFilenamePolicy(new FileStorageFileNamePolicy(logTypeEnum)).withNumShards(10));

自定义窗口:

public class FileTextIOWindowFn extends NonMergingWindowFn<Object, IntervalWindow> {

/**
 * 
 */
private static final long serialVersionUID = 1L;

private IntervalWindow assignWindow(AssignContext context) {
    FilePushTO filePushTO = (FilePushTO) context.element();
    String timestamp = filePushTO.getLogTime();
    DateTimeFormatter formatter = DateTimeFormat.forPattern(CommonConstants.DATE_FORMAT_YYYYMMDD_HHMMSS_SSS)
            .withZoneUTC();
    Instant start_point = Instant.parse(timestamp, formatter);
    Calendar cal = DateUtil.getCurrentDateInUTC();
    SimpleDateFormat DATE_FORMATER_PARTITION_NAME = DateUtil.getDateFormater();
    Instant end_point = Instant.parse(DATE_FORMATER_PARTITION_NAME.format(cal.getTime()), formatter);
    return new IntervalWindow(start_point, end_point);
};

@Override
public Coder<IntervalWindow> windowCoder() {
    return IntervalWindow.getCoder();
}

@Override
public Collection<IntervalWindow> assignWindows(AssignContext c) throws Exception {
    return Arrays.asList(assignWindow(c));
}

@Override
public boolean isCompatible(WindowFn<?, ?> other) {
    return false;
}

@Override
public WindowMappingFn<IntervalWindow> getDefaultWindowMappingFn() {
    throw new IllegalArgumentException(
            "Attempted to get side input window for GlobalWindow from non-global WindowFn");
}

}

文件名政策:

public class FileStorageFileNamePolicy extends FileBasedSink.FilenamePolicy {
/**
 * 
 */
private static final long serialVersionUID = 1L;

private static Logger LOGGER = LoggerFactory.getLogger(FileStorageFileNamePolicy.class);

private LogTypeEnum logTypeEnum;

public FileStorageFileNamePolicy(LogTypeEnum logTypeEnum) {
    this.logTypeEnum = logTypeEnum;
}

@Override
public ResourceId windowedFilename(ResourceId outputDirectory, WindowedContext context, String extension) {
    IntervalWindow window = (IntervalWindow) context.getWindow();
    String startDate = window.start().toString();
    String dateString = startDate.replace("T", CommonConstants.SPACE)
            .replaceAll(startDate.substring(startDate.indexOf("Z")), CommonConstants.EMPTY_STRING);
    String startDateHour = startDate;
    try {
        startDate = DateUtil.getDateForFileStore(dateString, null);
        startDateHour = DateUtil.getDTLocalTZHour(dateString, null);
    } catch (ParseException e) {
        LOGGER.error("Error converting date  : {}", e);
    }
    String filename = new StringBuilder(window.start().toString()).append(CommonConstants.COLON)
            .append(startDateHour).append(CommonConstants.UNDER_SCORE).append(context.getShardNumber())
            .append(".txt").toString();
    String dirName = new StringBuilder(startDate).append(CommonConstants.FORWARD_SLASH)
            .append(logTypeEnum.getValue().toLowerCase()).append(CommonConstants.FORWARD_SLASH).toString();
    LOGGER.info("Directory : {} and File Name : {}", dirName, filename);
    return outputDirectory.resolve(dirName, ResolveOptions.StandardResolveOptions.RESOLVE_DIRECTORY)
            .resolve(filename, ResolveOptions.StandardResolveOptions.RESOLVE_FILE);
}

@Override
public ResourceId unwindowedFilename(ResourceId outputDirectory, Context context, String extension) {
    throw new UnsupportedOperationException("Unsupported.");
}

}

我使用Interval窗口创建了customWindow,以便在FileNamePolicy中我可以获得适当的时间戳。我不能使用fixedWindow,因为它总会给我当前的时间戳。

在这里,一切都很完美,但文件无法附加。他们被覆盖了。

1 个答案:

答案 0 :(得分:1)

您可以使用Beam 2.1中提供的TextIO.write().to(...).withWindowedWrites()执行此操作。请参阅TextIO javadoc