Apache NiFi / Cassandra-PutCassandraRecord无法转换为Record对象

时间:2019-05-13 23:53:08

标签: cassandra apache-nifi

我正尝试使用NiFi的PutCassandraRecord处理器将一些JSON记录插入到Cassandra数据库中。 我正在尝试将时间戳类型插入Cassandra,但是NiFi抱怨输入字符串“ 2019-02-02T08:00:00.000”的NumberFormatException

所述时间戳字段的cassandra数据类型为(ts时间戳) 我在使用Avro模式: {“ name”:“ ts”,“ type”:{“ type”:“ long”,“ logicalType”:“ timestamp-millis”}}

{
  "name": "app.records",
  "type": "record",
  "fields": [
    { "name": "ts", "type": {"type": "long", "logicalType": "timestamp-millis"}},
    { "name": "app_name", "type": "string" },

NiFi日志显示它能够解析JSON对象,但无法将其转换为记录...

2019-05-13 21:13:04,036 ERROR [Timer-Driven Process Thread-2] o.a.n.p.cassandra.PutCassandraRecord PutCassandraRecord[id=ecb33d77-cc4a-17f5-23a8-e002e1777a1c] Unable to write the records into Cassandra table due to org.apache.nifi.serialization.MalformedRecordException: Successfully parsed a JSON object from input but failed to convert into a Record object with the given schema: org.apache.nifi.serialization.MalformedRecordException: Successfully parsed a JSON object from input but failed to convert into a Record object with the given schema
org.apache.nifi.serialization.MalformedRecordException: Successfully parsed a JSON object from input but failed to convert into a Record object with the given schema
        at org.apache.nifi.json.AbstractJsonRowRecordReader.nextRecord(AbstractJsonRowRecordReader.java:98)
        at org.apache.nifi.serialization.RecordReader.nextRecord(RecordReader.java:50)
        at org.apache.nifi.processors.cassandra.PutCassandraRecord.onTrigger(PutCassandraRecord.java:151)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "2019-02-02T08:00:35.473"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.nifi.serialization.record.util.DataTypeUtils.toTimestamp(DataTypeUtils.java:1057)
        at org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:156)
        at org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:120)
        at org.apache.nifi.json.JsonTreeRowRecordReader.convertField(JsonTreeRowRecordReader.java:170)
        at org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:137)
        at org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:83)
        at org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:74)
        at org.apache.nifi.json.AbstractJsonRowRecordReader.nextRecord(AbstractJsonRowRecordReader.java:93)
        ... 14 common frames omitted

类型似乎都是正确的。 任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:0)

问题是您试图插入时间戳记字段而未指定日期格式。 corresponding code如下所示:

  

如果输入数据是字符串,则尝试获取其格式字符串,然后,如果格式字符串是有效的格式化程序,则使用它获取日期。如果未指定格式字符串或该格式字符串无效,则NiFi会尝试使用Long.parseLong对其进行转换。

您需要使用以下类似的方法对相应字段进行显式转换:

toDate("yyyy-MM-dd'T'hh:mm:ss")

答案 1 :(得分:0)

我最终将日期时间转换为纪元时间戳,并将其转换为毫秒,并将其强制转换为较长时间,以使其能够与我的Avro模式配合使用。

ts = datetime.datetime.strptime(strippedTime, '%Y-%m-%d %H:%M:%S.%f')
epoch = datetime.datetime(1970,1,1)
timestamp = long((ts-epoch).total_seconds()*1000)
fields['ts'] = timestamp