我想使用java将镶木地板文件直接保存到hdfs。
这是我用来生成镶木地板文件并将它们存储在本地的代码,但现在我想将它们存储在hdfs中。
final String schemaLocation = "/home/javier/FlinkProjects/kafka-flink/src/main/java/com/grallandco/demos/avro.json";
final Schema avroSchema = new Schema.Parser().parse(new File(schemaLocation));
final MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);
final WriteSupport writeSupport = new AvroWriteSupport(parquetSchema, avroSchema);
final String parquetFile = "/home/javier/parquet-files/data" + postfijoFilename + ".parquet";
final Path path = new Path(parquetFile);
AvroParquetWriter parquetWriter = new AvroParquetWriter(path,
avroSchema, CompressionCodecName.SNAPPY, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE);
final GenericRecord record = new GenericData.Record(avroSchema);
record.put(Constantes.CAMPO_ID, datos[0]);
record.put("movie", datos[1]);
record.put("date", datos[2]);
record.put("imdb", datos[3]);
parquetWriter.write(record);
我想替换这个
final String parquetFile = "/home/javier/parquet-files/data" + postfijoFilename + ".parquet";
有一个hadoop hdfs路径,任何想法???
答案 0 :(得分:0)
您可以通过以下方式执行此操作(请注意,该位置必须存在并在代码中更改您的hdfsurl和用户名。可能需要在hdfs中使用架构):
final String schemaLocation = "/home/javier/FlinkProjects/kafka-flink/src/main/java/com/grallandco/demos/avro.json";
final Schema avroSchema = new Schema.Parser().parse(new File(schemaLocation));
final MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);
final WriteSupport writeSupport = new AvroWriteSupport(parquetSchema, avroSchema);
final Path path = new Path("/user/hduser/parquet-files/data" +
postfijoFilename + ".parquet");
Configuration configuration = new Configuration();
String hdfsUrl = "hdfs://hadoopnamenode:9000/";
String username = "hduser";
FileSystem fs= FileSystem.get(new URI(hdfsUrl), configuration);
UserGroupInformation ugi =
UserGroupInformation.createRemoteUser(username);
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
AvroParquetWriter parquetWriter = new
AvroParquetWriter(path,
avroSchema,
CompressionCodecName.SNAPPY,
ParquetWriter.DEFAULT_BLOCK_SIZE,
ParquetWriter.DEFAULT_PAGE_SIZE);
final GenericRecord record = new
GenericData.Record(avroSchema);
record.put(Constantes.CAMPO_ID, datos[0]);
record.put("movie", datos[1]);
record.put("date", datos[2]);
record.put("imdb", datos[3]);
parquetWriter.write(record);
return null;
}
});