Java将Parquet文件读取为JSON输出

时间:2018-08-29 19:55:16

标签: java json apache-spark hadoop parquet

正在读取实木复合地板文件,但是正在使用缩进格式而不是所需的JSON输出格式。有任何想法吗?我当时在想可能需要更改GroupRecordConverter,但是找不到太多的文档。如果可以指出我的意思,也将有所帮助。非常感谢您的帮助。

long num = numLines;
try {
  ParquetMetadata readFooter = ParquetFileReader.readFooter(conf, path, ParquetMetadataConverter.NO_FILTER);
  MessageType schema = readFooter.getFileMetaData().getSchema();
  ParquetFileReader r = new ParquetFileReader(conf,path,readFooter);

  PageReadStore pages = null;
  try{
    while(null != (pages = r.readNextRowGroup())) {
      final long rows = pages.getRowCount();
      System.out.println("Number of rows: " + rows);

      final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
      final RecordReader recordReader = columnIO.getRecordReader(pages, new GroupRecordConverter(schema));
      String sTemp = "";
      for(int i=0; i<rows && num-->0; i++) {
        System.out.println(recordReader.read().toString())
      }
    }
  }
}

当前缩进的输出:

data1: value1
data2: value2
models
  map
    key: data3
    value
      array: value3
  map
    key: data4
    value
      array: value4
data5: value5
...

所需的JSON输出:

"data1": "value1",
"data2": "value2",
"models": {
    "data3": [
        "value3"
    ],
    "data4": [
        "value4"
    ]
},
"data5": "value5"
...

2 个答案:

答案 0 :(得分:0)

java parquet lib的cat命令工具代码,也许可以为您提供示例... 包含以下行:

org.apache.parquet.tools.json.JsonRecordFormatter.JsonGroupFormatter formatter = JsonRecordFormatter.fromSchema(metadata.getFileMetaData().getSchema());

有关完整信息,请参见here

答案 1 :(得分:0)

我将SimpleRecord的源代码修改为JsonObject方法

  <form id="contactform" role="form" action="twiliocall" method="POST">
              <input type="hidden" name="authenticity_token">
            <div class="form-group">
              <h3>Call Sales</h3>
            </div>
            <label>Outbound Call Number</label>

                <div class="form-group">
                 <input class="form-control" type="text" name="userPhone" id="userPhone"
                        placeholder="Outbound Number">
                      </div>


             <label>Support Team Number</label>

              <div class="form-group">
                <input class="form-control" type="text" name="salesPhone" id="salesPhone"
                        placeholder="Origin Number">
              </div>
             <br>
            <button type="submit" onclick= class="btn btn-default">Contact Support</button>
          </form>