hive无法使用嵌套的avro架构

时间:2017-05-27 07:36:55

标签: hive nested schema avro

我尝试使用嵌套的avro架构来创建一个配置单元表。但它不起作用。我在cdh5.7.2中使用了hive 1.1。

这是我的嵌套avro架构:

[
    {
        "type": "record",
        "name": "Id",
        "namespace": "com.test.app_list",
        "doc": "Device ID",
        "fields": [
            {
                "name": "idType",
                "type": "int"
            },{
                "name": "id",
                "type": "string"
            }
        ]
    },

    {
        "type": "record",
        "name": "AppList",
        "namespace": "com.test.app_list",
        "doc": "",
        "fields": [
            {
                "name": "appId",
                "type": "string",
                "avro.java.string": "String"
            },
            {
                "name": "timestamp",
                "type":  "long"
            },

            {
                "name": "idList",
                "type": [{"type": "array", "items": "com.test.app_list.Id"}]
            }

        ]
    }
]

我的sql创建表:

CREATE EXTERNAL TABLE app_list
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='/hive/schema/test_app_list.avsc');

但是蜂巢给了我:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.avro.AvroSerdeException Schema for table must be of type RECORD. Received type: UNION)

hive doc显示:Supports arbitrarily nested schemas.来自:https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Overview–WorkingwithAvrofromHive

数据样本:

{
    "appId":{"string":"com.test.app"},
    "timestamp":{"long":1495893601606},
    "idList":{
        "array":[
            {"idType":15,"id":"6c:5c:14:c3:a5:39"},
            {"idType":13,"id":"eb297afe56ff340b6bb7de5c5ab09193"}
        ]
    }

}

但我不知道该怎么做。我需要一些帮助来解决这个问题。谢谢!

1 个答案:

答案 0 :(得分:0)

你的avro架构的顶层期望是一个记录类型,这就是为什么Hive不允许这样做。解决方法可以创建顶级作为记录,并在内部创建两个字段作为记录类型。

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.hhoang.articleweb</groupId>
<artifactId>articleweb</artifactId>
<packaging>war</packaging>
<version>0.0.1-SNAPSHOT</version>
<name>articleweb Maven Webapp</name>
<url>http://maven.apache.org</url>

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>1.4.1.RELEASE</version>
</parent>

<properties>
    <java.version>1.8</java.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.tomcat.embed</groupId>
        <artifactId>tomcat-embed-jasper</artifactId>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>
    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>jstl</artifactId>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
    </dependency>
    <dependency>
        <groupId>javax.persistence</groupId>
        <artifactId>persistence-api</artifactId>
        <version>1.0.2</version>
    </dependency>
    <dependency>
        <groupId>com.google.api-client</groupId>
        <artifactId>google-api-client</artifactId>
        <version>1.22.0</version>
    </dependency>
    <dependency>
        <groupId>com.google.oauth-client</groupId>
        <artifactId>google-oauth-client-jetty</artifactId>
        <version>1.22.0</version>
    </dependency>
    <dependency>
        <groupId>com.google.apis</groupId>
        <artifactId>google-api-services-drive</artifactId>
        <version>v3-rev59-1.20.0</version>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.11.132</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
        </plugin>
        <plugin>
            <groupId>com.google.cloud.tools</groupId>
            <artifactId>appengine-maven-plugin</artifactId>
            <version>1.2.1</version>
        </plugin>
    </plugins>
    <finalName>articleweb</finalName>
</build>