我在apache Beam中使用TextIO读取Json文件,并尝试将其写回到文件中(仅查看正在读取的内容)。输出文件数据格式不正确
输入:
[{
"id": 1,
"username": "mrscarter",
"email_address": null,
"phone_number": null,
"first_name": "Beyonce",
"last_name": "Knowles-Carter",
"middle_name": null,
"sex": null,
"birthdate": "1981-09-04",
"join_date": "2016-01-01",
"previous_logins": 10000,
"last_ip": "testing"
},
{
"id": 2,
"username": "jayz",
"email_address": null,
"phone_number": null,
"first_name": "Shawn",
"last_name": "Carter",
"middle_name": "Corey",
"sex": null,
"birthdate": "1969-12-04",
"join_date": "2016-01-01",
"previous_logins": 20000,
"last_ip": null
}]
输出:
[{
"id": 1,
"username": "mrscarter",
"email_address": null,
"phone_number": null,
},
{
"id": 2,
"username": "jayz",
"email_address": null,
"phone_number": null,
"middle_name": null,
"sex": null,
"birthdate": "1981-09-04",
"join_date": "2016-01-01",
"middle_name": "Corey",
"sex": null,
"birthdate": "1969-12-04",
"join_date": "2016-01-01",
"previous_logins": 20000,
"last_ip": null
}]
"first_name": "Shawn",
"last_name": "Carter",
"first_name": "Beyonce",
"last_name": "Knowles-Carter",
"previous_logins": 10000,
"last_ip": "testing"
为什么会这样?有没有办法顺序写入文件?
修改:
这是读取Json并将其写回到文件的基本管道。
PipelineOptions options =
PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
Schema schema = new Schema.Parser().parse(new
File("C:\\Users\\abc\\\\beamexample\\src\\main\\schemas\\test.avsc"));
PCollection<String> inputJson=
p.apply("Read JSON", TextIO.read().
from("C:\\Users\\abc\\Desktop\\test.json"));
inputJson.apply(TextIO.write().withoutSharding().to("C:\\Users\\abc\\beamexample\\Output.txt"));
p.run();
我没有其他方法可以检查PCollection是否具有数据,因此我将其写回到文件中进行检查