我希望通过Hive找到一种方法来获取以下AVSC文件内容并外化嵌套模式" RENTALRECORDTYPE"出于模式重用的目的。
{
"type": "record",
"name": "EMPLOYEE",
"namespace": "",
"doc": "EMPLOYEE is a person that works here",
"fields": [
{
"name": "RENTALRECORD",
"type": {
"type": "record",
"name": "RENTALRECORDTYPE",
"namespace": "",
"doc": "Rental record is a record that is kept on every item rented",
"fields": [
{
"name": "due_date",
"doc": "The date when item is due",
"type": "int"
}
]
}
},
{
"name": "hire_date",
"doc": "Employee date of hire",
"type": "int"
}
]
}
这种定义架构的方法很好。我能够发出以下HiveQL语句并成功创建表。
CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');
但是,我希望能够引用现有模式,而不是在多个模式中复制记录定义。例如,将生成两个AVSC文件而不是单个模式文件。即rentalrecord.avsc和employee.avsc。
rentalrecord.avsc
{
"type": "record",
"name": "RENTALRECORD",
"namespace": "",
"doc": "A record that is kept for every rental",
"fields": [
{
"name": "due_date",
"doc": "The date on which the rental is due back to the store",
"type": "int"
}
]
}
employee.avsc
{
"type": "record",
"name": "EMPLOYEE",
"namespace": "",
"doc": "EMPLOYEE is a person that works for the VIDEO STORE",
"fields": [
{
"name": "rentalrecord",
"doc": "A rental record is a record on every rental",
"type": "RENTALRECORD"
},
{
"name": "hire_date",
"doc": "Employee date of hire",
"type": "int"
}
]
}
在上面的场景中,我们希望能够外部化 RENTALRECORD 架构定义,并能够在 employee.avsc 和其他地方重复使用它。
尝试使用以下两个HiveQL语句导入架构时,它失败了......
CREATE EXTERNAL TABLE rentalrecord
STORED AS AVRO
LOCATION '/user/dtom/store/data/rentalrecord'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema /rentalrecord.avsc');
CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');
rentalrecord.avsc已成功导入,但employee.avsc在第一个字段定义上失败。 “RENTALRECORD”类型的字段。 Hive ...
输出以下错误FAILED:执行错误,返回代码1 org.apache.hadoop.hive.ql.exec.DDLTask。了java.lang.RuntimeException: MetaException(消息:org.apache.hadoop.hive.serde2.SerDeException 遇到异常确定模式。将信号模式返回到 表明问题:" RENTALRECORD"不是定义的名称。类型 " rentalrecord"字段必须是已定义的名称或{"类型": ......}表达。)
我的研究告诉我,Avro文件确实支持这种形式的模式重用。所以要么我错过了某些东西,要么这是Hive不支持的东西。
非常感谢任何帮助。
答案 0 :(得分:0)
我已经使用所有引用定义了AVDL,然后使用带有idl2schemata选项的avro工具jar文件来生成avsc。生成的avsc就像一个带有蜂巢的魅力!!