错误-使用NiFi中的ConvertRecord在令牌和定界符之间的无效字符

时间:2018-12-19 00:15:45

标签: apache amazon-s3 apache-nifi

我正在使用以下流程(基本上是从s3提取文件,然后从主要CSV文件转换少量记录,然后再将其推送到Elasticsearch): GetSQS-> UpdateAtttribute-> SplitJson-> EvaluateJsonPath-> UpdateAttribute-> convertRecord->其他处理器...

我能够从s3正确获取文件,但是ConvertRecord处理器发出错误消息:封装的令牌和定界符之间的字符无效

请在下面找到ConvertRecord配置:

**CSVRecordReader** : Schema Access strategy as "Use 'Schema Text' Property

Schema Text: 


{
  "type": "record",
  "name": "AVLRecord0",
  "fields" : [
    {"name": "TimeOfDay","type": "string", "logicalType":"timestamp-millis"},
    {"name": "Field_0", "type": "double"},
    {"name": "Field_1", "type": "double"},
    {"name": "Field_2", "type": "double"},
    {"name": "Field_3", "type": "double"}}
]
}
**CSVRecordWritter**: 

Schema Write Strategy : Set 'Avro. schema' Attribute

Schema Access Strategy: Use Schema Text Property

请告诉我为什么从S3成功获取后为什么看不到转换后的记录。

所需的输出仅为 CSV格式。请找到在s3上载的附件样本文件,我只想转换到field_5。

enter image description here

随附了contoller服务的屏幕截图:

enter image description here

enter image description here

enter image description here

谢谢!

2 个答案:

答案 0 :(得分:0)

我发现了自己的错误: 1.我忘了在EvaluateJsonPath之后添加FetchS3Object Processor 2.我的模式文本属性中有一个逗号。

答案 1 :(得分:-1)

您能说出转换记录处理器中那个多余的逗号到底在哪里吗? 由于我面临着同样的问题。 据我了解,由于size_dimension字段而出现问题 以下是我的csv数据:

id,project,name,depth,parentid,description,createdtime,lastupdatedtime,metadata,path,source,sourceid
75125,abcd,P200184,4,74861,"WIRELINE RUNNING / RETRIEVING TOOL, SUPP",2002-06-04 00:00:00.0,2019-04-26 00:00:00.0,"{""material_group"":""group"",""weight_unit"":""LB"",""laboratory"":""PMC"",""object_type"":""material"",""pyspark_generated_time"":""2019-06-07, 13:32:20.287657"",""size_dimension"":""3'5\""L X 3'5\""W X 1'H"",""gross_weight"":""100.000"",""net_weight"":""100.000"",""valid_from_date"":""20031219""}","[59941,64249,74859,74861,75125]",RPA_SAA.MRA,P200184

我使用的avro模式是:

{
    "name":"abc",
    "namespace":"nifi",
    "type":"record",
    "fields": [
    {"name":"id", "type": ["long", "null"], "default": null},
    {"name":"project", "type": ["string", "null"], "default": null},
    {"name":"name", "type": ["string", "null"], "default": null},
    {"name":"depth", "type": ["int", "null"], "default": null},
    {"name":"parentid", "type": ["long", "null"], "default": null},
    {"name":"description", "type": ["string", "null"], "default": null},
    {"name":"createdtime","type": ["null",{ "type":"long", "logicalType":"timestamp-millis"}], "default":null},
    {"name":"lastupdatedtime","type": ["null",{ "type":"long", "logicalType":"timestamp-millis"}], "default":null},
    {"name":"metadata","type": ["string", "null"], "default": null},
    {"name":"path","type": ["string", "null"], "default": null},
    {"name":"source", "type": ["string", "null"], "default": null},
    {"name":"sourceid", "type": ["string", "null"], "default": null}
    ]
}