写入文件的数据在apache Beam中乱序

时间:2018-10-03 20:37:46

标签: file apache-beam

我在apache Beam中使用TextIO读取Json文件,并尝试将其写回到文件中(仅查看正在读取的内容)。输出文件数据格式不正确

输入:

[{
    "id": 1,
    "username": "mrscarter",
    "email_address": null,
    "phone_number": null,
    "first_name": "Beyonce",
    "last_name": "Knowles-Carter",
    "middle_name": null,
    "sex": null,
    "birthdate": "1981-09-04",
    "join_date": "2016-01-01",
    "previous_logins": 10000,
    "last_ip": "testing"
},

{
    "id": 2,
    "username": "jayz",
    "email_address": null,
    "phone_number": null,
    "first_name": "Shawn",
    "last_name": "Carter",
    "middle_name": "Corey",
    "sex": null,
    "birthdate": "1969-12-04",
    "join_date": "2016-01-01",
    "previous_logins": 20000,
    "last_ip": null
}]

输出:

 [{
    "id": 1,
    "username": "mrscarter",
    "email_address": null,
    "phone_number": null,
},

{
    "id": 2,
    "username": "jayz",
    "email_address": null,
    "phone_number": null,
    "middle_name": null,
    "sex": null,
    "birthdate": "1981-09-04",
    "join_date": "2016-01-01",
    "middle_name": "Corey",
    "sex": null,
    "birthdate": "1969-12-04",
    "join_date": "2016-01-01",
    "previous_logins": 20000,
    "last_ip": null
}]
    "first_name": "Shawn",
    "last_name": "Carter",
    "first_name": "Beyonce",
    "last_name": "Knowles-Carter",
    "previous_logins": 10000,
    "last_ip": "testing"

为什么会这样?有没有办法顺序写入文件?

修改

这是读取Json并将其写回到文件的基本管道。

PipelineOptions options = 
   PipelineOptionsFactory.create();

Pipeline p = Pipeline.create(options);

Schema  schema = new Schema.Parser().parse(new 
      File("C:\\Users\\abc\\\\beamexample\\src\\main\\schemas\\test.avsc"));

    PCollection<String> inputJson=
           p.apply("Read JSON", TextIO.read().
                    from("C:\\Users\\abc\\Desktop\\test.json"));

    inputJson.apply(TextIO.write().withoutSharding().to("C:\\Users\\abc\\beamexample\\Output.txt"));
  p.run();

我没有其他方法可以检查PCollection是否具有数据,因此我将其写回到文件中进行检查

0 个答案:

没有答案