Nifi PutHive3Streaming-写入分区表

时间:2018-10-29 17:16:44

标签: hive apache-nifi

我正在使用Nifi 1.7.1写入分区的配置单元表。尽管数据流传输成功,但是在配置单元metastore日志中看到了几条消息:

Flux.just(
  new Request(redRequest, RED),
  new Request(blueRequest, BLUE),
  new Request(greenRequest, GREEN)
)
.parallel()
.flatMap(request -> 
  client.getBytes(request)
    .map(response -> new Response(response.get(0), request.color))
)
.sequential()
.collectList()
.map(response -> {
  byte[] redBytes = response.bytes;
  Color color = response.color;
  // 
})

我已经尝试过:

2018-10-29T17:09:40,682 ERROR [pool-10-thread-198]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - AlreadyExistsException(message:Partition already exists: Partition(values:[2018, 3, 28], dbName:default, tableName:myTable, createTime:0, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:type, type:string, comment:null), FieldSchema(name:id, type:string, comment:null), FieldSchema(name:referenced_event_id, type:string, comment:null), FieldSchema(name:happened, type:string, comment:null), FieldSchema(name:processed, type:string, comment:null), FieldSchema(name:tracking_id, type:string, comment:null), FieldSchema(name:source_attributes, type:struct<id:string,origin:string,data:map<string,string>,external_data:map<string,string>>, comment:null), FieldSchema(name:event_data, type:struct<service:struct<name:string,version:string>,result:struct<mno:string,mvno:string,mcc:string,mnc:string,country:string>>, comment:null)], location:hdfs://node-master:8020/user/hive/warehouse/myTable/year=2018/month=3/day=28, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:6, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[tracking_id], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:null, catName:hive))

以及

"hive3-stream-part-vals": "${year},${month},${day}",
"hive3-stream-autocreate-partition": "false",

有人清楚为什么会记录这些错误吗?

1 个答案:

答案 0 :(得分:2)

我认为您遇到了https://issues.apache.org/jira/browse/HIVE-18931。处理器的“最大并行任务数”属性的设置是什么?如果大于1,可以尝试将其设置为1,看看是否仍然收到此消息吗?如果为1,是否有多个客户端(NiFi,beeline等)试图同时写入该表?