如何序列化JSON并将其写入文件?

时间:2019-04-23 22:50:33

标签: java apache-beam apache-beam-io

所以我写了一个Apache Beam Pipeline,它读取一个包含99个其他文件的文件,计算校验和并创建文件的键值对,我需要做的就是将这些键值对写入一个manifest.json文件我目前遇到一些序列化问题,任何建议和帮助都将是惊人的。

这是我的代码:

public class BeamPipeline {
    private static final Logger log = LoggerFactory.getLogger(BeamPipeline.class);
    public static interface MyOptions extends PipelineOptions {

        @Description("Input Path(with gs:// prefix)")
        String getInput();
        void setInput(String value);
    }

    public static void main(String[] args) {

        MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
        Pipeline p = Pipeline.create(options);
        JsonObject obj = new JsonObject();
        File dir = new File(options.getInput());
        for (File file : dir.listFiles()) {
                String inputString = file.toString();
                p
                        .apply("Match Files", FileIO.match().filepattern(inputString))
                        .apply("Read Files", FileIO.readMatches())
                        .apply(MapElements.via(new SimpleFunction<FileIO.ReadableFile, KV<String, String>>() {
                            public KV<String, String> apply(FileIO.ReadableFile file) {
                                String temp = null;
                                try {
                                    temp = file.readFullyAsUTF8String();
                                } catch (IOException e) {

                                }
                                String sha256hex = org.apache.commons.codec.digest.DigestUtils.sha256Hex(temp);

                                obj.addProperty(temp, sha256hex);
                                String json = obj.toString();

                                try (FileWriter fileWriter = new FileWriter("./manifest.json")) {
                                    fileWriter.write(json);
                                } catch (IOException e) {

                                }

                                return KV.of(file.getMetadata().resourceId().toString(), sha256hex);

                            }
                        }))
                        .apply("Print", ParDo.of(new DoFn<KV<String, String>, Void>() {
                            @ProcessElement
                            public void processElement(ProcessContext c) {


                                log.info(String.format("File: %s, SHA-256 %s", c.element().getKey(), c.element().getValue()));

                            }
                        }));
                }
            p.run();
        }
}

这是我目前的错误:

"main" java.lang.IllegalArgumentException: unable to serialize DoFnAndMainOutput{doFn=org.apache.beam.sdk.transforms.MapElements$1@50756c76, mainOutputTag=Tag<output>}
Caused by: java.io.NotSerializableException: com.google.gson.JsonObject

1 个答案:

答案 0 :(得分:1)

DoFns序列化了从Dofn访问的所有对象。 JsonObject无法序列化。它们是在DoFn之外创建的,并在DoFn中引用,这使得DoFn不可序列化。

您可以在DoFn中使用创建JsonObject来避免这种序列化问题。

public class BeamPipeline {
    private static final Logger log = LoggerFactory.getLogger(BeamPipeline.class);
    public static interface MyOptions extends PipelineOptions {

        @Description("Input Path(with gs:// prefix)")
        String getInput();
        void setInput(String value);
    }

    public static void main(String[] args) {

        MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
        Pipeline p = Pipeline.create(options);
        File dir = new File(options.getInput());
        for (File file : dir.listFiles()) {
                String inputString = file.toString();
                p
                        .apply("Match Files", FileIO.match().filepattern(inputString))
                        .apply("Read Files", FileIO.readMatches())
                        .apply(MapElements.via(new SimpleFunction<FileIO.ReadableFile, KV<String, String>>() {
                            public KV<String, String> apply(FileIO.ReadableFile file) {
                                String temp = null;
                                try {
                                    temp = file.readFullyAsUTF8String();
                                } catch (IOException e) {

                                }
                                String sha256hex = org.apache.commons.codec.digest.DigestUtils.sha256Hex(temp);

                                JsonObject obj = new JsonObject();
                                obj.addProperty(temp, sha256hex);
                                String json = obj.toString();

                                try (FileWriter fileWriter = new FileWriter("./manifest.json")) {
                                    fileWriter.write(json);
                                } catch (IOException e) {

                                }

                                return KV.of(file.getMetadata().resourceId().toString(), sha256hex);

                            }
                        }))
                        .apply("Print", ParDo.of(new DoFn<KV<String, String>, Void>() {
                            @ProcessElement
                            public void processElement(ProcessContext c) {


                                log.info(String.format("File: %s, SHA-256 %s", c.element().getKey(), c.element().getValue()));

                            }
                        }));
                }
            p.run();
        }
}