Question

我正在尝试在aws athena中执行以下select语句：

SELECT
    col_1,
    col_2
FROM "my_database"."my_table"
WHERE
        partition_1='20171130'
    AND
        partition_2='Y'
LIMIT 10

我得到了错误：

Your query has the following error(s):

HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 in file s3://my-s3-path/my-table/partition_1=20171130/partition_2=Y/part-1111-11111111-1111-1111-1111-111111111111.snappy.parquet

This query ran against the "my_database" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 1111111-1111-1111-1111-111111111111.

但是当我只删除一个列时，它就可以了！例如。选择1列工作：

SELECT col_1 FROM "my_database"."my_table" WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10
SELECT col_2 FROM "my_database"."my_table" WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10

此外，我发现我可以为select语句添加多于1列，但它只是失败了一些组合。但为什么？表定义是：

{
  "Table": {
    "StorageDescriptor": {
      "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
      "SortColumns": [],
      "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
      "SerdeInfo": {
        "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
        "Parameters": {
          "serialization.format": "1"
        }
      },
      "BucketColumns": [],
      "Parameters": {
        "CrawlerSchemaDeserializerVersion": "1.0",
        "compressionType": "none",
        "UPDATED_BY_CRAWLER": "myCrawler",
        "classification": "parquet",
        "recordCount": "40190451",
        "typeOfData": "file",
        "CrawlerSchemaSerializerVersion": "1.0",
        "objectCount": "18",
        "averageRecordSize": "35",
        "exclusions": "[\"s3://my-s3-path/my_table/_**\"]",
        "sizeKey": "1078884110"
      },
      "Location": "s3://my-s3-path/my-table/",
      "NumberOfBuckets": -1,
      "StoredAsSubDirectories": false,
      "Columns": [
        {
          "Type": "smallint",
          "Name": "col_1"
        },
        {
          "Type": "decimal(18,6)",
          "Name": "col_2"
        }
      ],
      "Compressed": false
    },
    "UpdateTime": 1515503623.0,
    "PartitionKeys": [
      {
        "Type": "string",
        "Name": "partition_1"
      },
      {
        "Type": "string",
        "Name": "partition_2"
      }
    ],
    "Name": "my_table",
    "Parameters": {
      "CrawlerSchemaDeserializerVersion": "1.0",
      "compressionType": "none",
      "UPDATED_BY_CRAWLER": "myCrawler",
      "classification": "parquet",
      "recordCount": "40190451",
      "typeOfData": "file",
      "CrawlerSchemaSerializerVersion": "1.0",
      "objectCount": "18",
      "averageRecordSize": "35",
      "exclusions": "[\"s3://my-s3-path/my_table/_**\"]",
      "sizeKey": "1078884110"
    },
    "LastAccessTime": 1515503623.0,
    "CreatedBy": "arn:aws:sts::111111111111:assumed-role/MyRole/myCrawler",
    "TableType": "EXTERNAL_TABLE",
    "Owner": "owner",
    "CreateTime": 1515503623.0,
    "Retention": 0
  }
}

Answer 1

通常，此错误是由于文件中的架构或数据不匹配造成的。对于实木复合地板，请确保文件的架构完全匹配Hive架构。甚至列的顺序有时也很重要，例如结构。

此外，如果输出文件行中不包含该列的任何数据，则拼写作者有时会删除稀疏字段。在分区的情况下这很常见。

如何在athena的几个列上修复HIVE_CURSOR_ERROR

1 个答案: