Question

我正在尝试解析我存储在谷歌云存储中的一些Json。我正在使用apache beam创建一个读取存储的json的管道，然后将其写入云sql数据库。在编写我的解析方法时，我遇到了一些奇怪的行为。

这是我的Json：

        [{
            "projectid": "Reminder101",
            "reminderkey": "001",
            "localid": "01",
            "timestamp": "2018-01-24 12:00"
        },
        {
            "projectid": "Reminder101",
            "reminderkey": "002",
            "localid": "02",
            "timestamp": "2018-01-25 9:00"
        },
        {
            "projectid": "Reminder101",
            "reminderkey": "003",
            "localid": "03",
            "timestamp": "2018-02-01 18:00"
        },
        {
            "projectid": "Reminder101",
            "reminderkey": "004",
            "localid": "04",
            "timestamp": "2018-02-6 15:35"
        },
        {
            "projectid": "USReminder101",
            "reminderkey": "001",
            "localid": "01",
            "timestamp": "2018/01/30 21:00"
        }
    ]

这是我的json parse方法（JsonHolder只是一个pojo）：

static class ParseJsonDoFn extends DoFn<String, List<JsonHolder>> {
  @ProcessElement
  public void processElement(ProcessContext context) {
    String incomingInfo = context.element();
    Gson gson = new Gson();
    System.out.println(incomingInfo.toString());
    Type type = new TypeToken<List<JsonHolder>>(){}.getType();
    List<JsonHolder> jsonholders = gson.fromJson(incomingInfo, type);
    context.output(jsonholders);
  }
}

现在当我像这样运行我的管道时，我收到错误：

Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
com.google.gson.JsonSyntaxException: 
com.google.gson.stream.MalformedJsonException: Expected value at line 1 
column 2 path $

System.out.println（）;所示：

"timestamp": "2018-02-01 18:00"
},
    "reminderkey": "004",
[{
    "localid": "01",
},
    "timestamp": "2018-01-24 12:00"
    "reminderkey": "002",

但是，如果我创建一个没有格式化的单行Json文件，它会解析得很好。我可以推断的是，Json文件在每行结束后被拆分，但我不能为我的生活弄清楚为什么或如何纠正它。

干杯。

Answer 1

TextIO.read（）读取文本文件，将每行作为PCollection的单独元素返回。这样做是为了能够处理无限大小的文件，而无需将文件内容加载到内存中。

如果您的输入格式不是行分隔的，您可能需要更灵活的FileIO：match（）来查找与您感兴趣的文件模式匹配的文件，read（）以自动解压缩它们并获得对文件的方便处理内容作为ReadableFile。然后使用DoFn以任何方式解析文件。

使用DoFn解析Json时的奇怪行为

1 个答案: