我尝试使用AvroKeyValueSinkWriter来保存json数据。
我正在创建这样的作家:
protected static AvroKeyValueSinkWriter<String, JsonNode> getWriter() {
Map<String, String> properties = new HashMap<>();
Schema keySchema = Schema.create(Schema.Type.STRING);
String valueSchema = "\n" +
" {\"namespace\": \"transaction.avro\",\n" +
" \"type\": \"record\",\n" +
" \"name\": \"Transaction\",\n" +
" \"fields\": [\n" +
" {\"name\": \"InvoiceNo\", \"type\": \"int\" },\n" +
" {\"name\": \"StockCode\", \"type\": \"string\" },\n" +
// ... some lines hidden
" {\"name\": \"StoreID\", \"type\": \"int\" },\n" +
" {\"name\": \"TransactionID\", \"type\": \"string\" }\n" +
" ]\n" +
" }";
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_KEY_SCHEMA, keySchema.toString());
properties.put(AvroKeyValueSinkWriter.CONF_OUTPUT_VALUE_SCHEMA, valueSchema);
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS, String.valueOf(true));
properties.put(AvroKeyValueSinkWriter.CONF_COMPRESS_CODEC, DataFileConstants.SNAPPY_CODEC);
return new AvroKeyValueSinkWriter<String, JsonNode>(properties);
}
然后从我的测试用例开始:
@Test
public void testWriter() throws Exception {
AvroKeyValueSinkWriter<String, JsonNode> writer = getWriter();
String key = "5639281840180123";
String jsonString = "{\n" +
" \"InvoiceNo\": 5370812,\n" +
" \"StockCode\": 22409,\n" +
// ... some lines hidden
" \"StoreID\": 0,\n" +
" \"TransactionID\": \"537081210180130\"\n" +
"}";
ObjectMapper mapper = new ObjectMapper();
JsonNode jsonObj = mapper.readTree(jsonString);
writer.setSyncOnFlush(true);
writer.open(fs, path);
writer.write(new Tuple2<String, JsonNode>(key, jsonObj));
writer.flush();
//assertEquals();
}
}
然而,
org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.codehaus.jackson.node.ObjectNode cannot be cast to org.apache.avro.generic.IndexedRecord
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
at org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter$AvroKeyValueWriter.write(AvroKeyValueSinkWriter.java:247)
at org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter.write(AvroKeyValueSinkWriter.java:178)
at com.ibm.cloud.flink.StreamingJobTest.testWriter(StreamingJobTest.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.ClassCastException: org.codehaus.jackson.node.ObjectNode cannot be cast to org.apache.avro.generic.IndexedRecord
可以用AvroKeyValueSinkWriter编写JsonNode吗?如果没有,我可以传递什么呢?
答案 0 :(得分:0)
解决方案是从我的avro架构生成java源代码并使用生成的类:
DataStream<Tuple2<String, Object>> avroStream = objStream
.map(new MapFunction<ObjectNode, Tuple2<String, Object>>() {
@Override
public Tuple2<String, Object> map(ObjectNode value) throws Exception {
JsonNode result = value.get("value");
Transaction tx = convertToAvro(result);
return new Tuple2(value.get("key").toString(), tx);
}
});
....
public static Transaction convertToAvro(JsonNode result) {
return Transaction.newBuilder()
.setInvoiceNo(result.get("InvoiceNo").intValue())
.setStockCode(result.get("StockCode").intValue())
.setDescription(result.get("Description").toString())
.setQuantity(result.get("Quantity").intValue())
.setInvoiceDate(result.get("InvoiceDate").longValue())
.setUnitPrice(result.get("UnitPrice").floatValue())
.setCustomerID(result.get("CustomerID").intValue())
.setCountry(result.get("Country").toString())
.setLineNo(result.get("LineNo").intValue())
.setInvoiceTime(result.get("InvoiceTime").toString())
.setStoreID(result.get("StoreID").intValue())
.setTransactionID(result.get("TransactionID").toString())
.build();
}