S3通过Hive数据导出到DynamoDB

时间:2015-02-09 16:13:50

标签: amazon-s3 hive amazon-dynamodb emr amazon-emr

我用逗号分隔的.csv没有列,行如:

,310795849829453824,AAAAAQ==,Z3JvdXAtY2hhdA==,,
,310795709316075520,AAAAAA==,,,
,310778976203182080,AAAAAQ==,Z3JvdXAtY2hhdA==,,
,310795895400566784,AAAAAA==,,,
,310791016598736896,AAAAAQ==,Z3JvdXAtY2hhdA==,,

它存储在S3中,我将外部表定义如下:

create external table s3_chats(AVATAR BINARY, CHAT_ID BIGINT, CHAT_TYPE BINARY, NAME BINARY, VENUE_ID BINARY, VENUE_NAME BINARY) row format delimited fields terminated by ',' location 's3://dynamocsv/export/chats/'

此步骤一切正常,但是当我尝试

INSERT OVERWRITE TABLE dynamo_chats SELECT * FROM s3_chats

引发DynamoDB错误:

-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"avatar":,"chat_id":310795849829453824,"chat_type":^@^@^@^A,"name":group-chat,"venue_id":,"venue_name":}
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"avatar":,"chat_id":310795849829453824,"chat_type":^@^@^@^A,"name":group-chat,"venue_id":,"venue_name":}
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
        ... 8 more
Caused by: java.lang.RuntimeException: com.amazonaws.AmazonServiceException: One or more parameter values were invalid: An AttributeValue may not contain a null or empty binary type. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 1UQEO408OB918GASELHM5V2NB3VV4KQNSO5AEMVJF66Q9ASUAAJG)
        at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.handleException(DynamoDBFibonacciRetryer.java:107)
        at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:83)
        at org.apache.hadoop.dynamodb.DynamoDBClient.writeBatch(DynamoDBClient.java:217)
        at org.apache.hadoop.dynamodb.DynamoDBClient.putBatch(DynamoDBClient.java:167)
        at org.apache.hadoop.dynamodb.write.AbstractDynamoDBRecordWriter.write(AbstractDynamoDBRecordWriter.java:92)
        at org.apache.hadoop.hive.dynamodb.write.HiveDynamoDBRecordWriter.write(HiveDynamoDBRecordWriter.java:29)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:649)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
        ... 9 more

这实际上是有道理的:你不应该指定你不想在请求中写的行。

DynamoDBStorageHandler错误还是通过Hive编写可选字段有一些解决方法?

0 个答案:

没有答案