我用逗号分隔的.csv没有列,行如:
,310795849829453824,AAAAAQ==,Z3JvdXAtY2hhdA==,,
,310795709316075520,AAAAAA==,,,
,310778976203182080,AAAAAQ==,Z3JvdXAtY2hhdA==,,
,310795895400566784,AAAAAA==,,,
,310791016598736896,AAAAAQ==,Z3JvdXAtY2hhdA==,,
它存储在S3中,我将外部表定义如下:
create external table s3_chats(AVATAR BINARY, CHAT_ID BIGINT, CHAT_TYPE BINARY, NAME BINARY, VENUE_ID BINARY, VENUE_NAME BINARY) row format delimited fields terminated by ',' location 's3://dynamocsv/export/chats/'
此步骤一切正常,但是当我尝试
时INSERT OVERWRITE TABLE dynamo_chats SELECT * FROM s3_chats
引发DynamoDB错误:
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"avatar":,"chat_id":310795849829453824,"chat_type":^@^@^@^A,"name":group-chat,"venue_id":,"venue_name":}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"avatar":,"chat_id":310795849829453824,"chat_type":^@^@^@^A,"name":group-chat,"venue_id":,"venue_name":}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: java.lang.RuntimeException: com.amazonaws.AmazonServiceException: One or more parameter values were invalid: An AttributeValue may not contain a null or empty binary type. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 1UQEO408OB918GASELHM5V2NB3VV4KQNSO5AEMVJF66Q9ASUAAJG)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.handleException(DynamoDBFibonacciRetryer.java:107)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:83)
at org.apache.hadoop.dynamodb.DynamoDBClient.writeBatch(DynamoDBClient.java:217)
at org.apache.hadoop.dynamodb.DynamoDBClient.putBatch(DynamoDBClient.java:167)
at org.apache.hadoop.dynamodb.write.AbstractDynamoDBRecordWriter.write(AbstractDynamoDBRecordWriter.java:92)
at org.apache.hadoop.hive.dynamodb.write.HiveDynamoDBRecordWriter.write(HiveDynamoDBRecordWriter.java:29)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:649)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
这实际上是有道理的:你不应该指定你不想在请求中写的行。
是DynamoDBStorageHandler
错误还是通过Hive编写可选字段有一些解决方法?