如何在athena的几个列上修复HIVE_CURSOR_ERROR

时间:2018-01-09 15:02:23

标签: amazon-web-services hive hiveql amazon-athena

我正在尝试在aws athena中执行以下select语句:

SELECT
    col_1,
    col_2
FROM "my_database"."my_table"
WHERE
        partition_1='20171130'
    AND
        partition_2='Y'
LIMIT 10

我得到了错误:

Your query has the following error(s):

HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 in file s3://my-s3-path/my-table/partition_1=20171130/partition_2=Y/part-1111-11111111-1111-1111-1111-111111111111.snappy.parquet

This query ran against the "my_database" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 1111111-1111-1111-1111-111111111111.

但是当我只删除一个列时,它就可以了!例如。选择1列工作:

  • SELECT col_1 FROM "my_database"."my_table" WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10
  • SELECT col_2 FROM "my_database"."my_table" WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10

此外,我发现我可以为select语句添加多于1列,但它只是失败了一些组合。但为什么? 表定义是:

{
  "Table": {
    "StorageDescriptor": {
      "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
      "SortColumns": [],
      "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
      "SerdeInfo": {
        "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
        "Parameters": {
          "serialization.format": "1"
        }
      },
      "BucketColumns": [],
      "Parameters": {
        "CrawlerSchemaDeserializerVersion": "1.0",
        "compressionType": "none",
        "UPDATED_BY_CRAWLER": "myCrawler",
        "classification": "parquet",
        "recordCount": "40190451",
        "typeOfData": "file",
        "CrawlerSchemaSerializerVersion": "1.0",
        "objectCount": "18",
        "averageRecordSize": "35",
        "exclusions": "[\"s3://my-s3-path/my_table/_**\"]",
        "sizeKey": "1078884110"
      },
      "Location": "s3://my-s3-path/my-table/",
      "NumberOfBuckets": -1,
      "StoredAsSubDirectories": false,
      "Columns": [
        {
          "Type": "smallint",
          "Name": "col_1"
        },
        {
          "Type": "decimal(18,6)",
          "Name": "col_2"
        }
      ],
      "Compressed": false
    },
    "UpdateTime": 1515503623.0,
    "PartitionKeys": [
      {
        "Type": "string",
        "Name": "partition_1"
      },
      {
        "Type": "string",
        "Name": "partition_2"
      }
    ],
    "Name": "my_table",
    "Parameters": {
      "CrawlerSchemaDeserializerVersion": "1.0",
      "compressionType": "none",
      "UPDATED_BY_CRAWLER": "myCrawler",
      "classification": "parquet",
      "recordCount": "40190451",
      "typeOfData": "file",
      "CrawlerSchemaSerializerVersion": "1.0",
      "objectCount": "18",
      "averageRecordSize": "35",
      "exclusions": "[\"s3://my-s3-path/my_table/_**\"]",
      "sizeKey": "1078884110"
    },
    "LastAccessTime": 1515503623.0,
    "CreatedBy": "arn:aws:sts::111111111111:assumed-role/MyRole/myCrawler",
    "TableType": "EXTERNAL_TABLE",
    "Owner": "owner",
    "CreateTime": 1515503623.0,
    "Retention": 0
  }
}

1 个答案:

答案 0 :(得分:0)

通常,此错误是由于文件中的架构或数据不匹配造成的。对于实木复合地板,请确保文件的架构完全匹配Hive架构 。甚至列的顺序有时也很重要,例如结构。

此外,如果输出文件行中不包含该列的任何数据,则拼写作者有时会删除稀疏字段。在分区的情况下这很常见。