我可以跨多个文件拆分Apache Avro架构吗?

时间:2014-02-03 22:24:14

标签: avro

我能做到,

{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "bar", "type": {
            "type": "record",
            "name": "Bar",
            "fields": [ ]
        }}
    ]
}

并且工作正常,但假设我想将模式拆分为两个文件,例如:

{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "bar", "type": "Bar"}
    ]
}

{
    "type": "record",
    "name": "Bar",
    "fields": [ ]
}

Avro是否有能力这样做?

6 个答案:

答案 0 :(得分:29)

是的,这是可能的。

我已经在我的java项目中通过在avro-maven-plugin中定义公共模式文件来完成这项工作 例如:

search_result.avro:

{"namespace": "com.myorg.other",
 "type": "record",
 "name": "SearchResult",
 "fields": [
     {"name": "type", "type": "SearchResultType"},
     {"name": "keyWord",  "type": "string"},
     {"name": "searchEngine", "type": "string"},
     {"name": "position", "type": "int"},
     {"name": "userAction", "type": "UserAction"}
 ]
}

search_suggest.avro:

{"namespace": "com.myorg.other",
 "type": "record",
 "name": "SearchSuggest",
 "fields": [
     {"name": "suggest", "type": "string"},
     {"name": "request",  "type": "string"},
     {"name": "searchEngine", "type": "string"},
     {"name": "position", "type": "int"},
     {"name": "userAction", "type": "UserAction"},
     {"name": "timestamp", "type": "long"}
 ]
}

user_action.avro:

{"namespace": "com.myorg.other",
 "type": "enum",
 "name": "UserAction",
 "symbols": ["S", "V", "C"]
}

search_result_type.avro

{"namespace": "com.myorg.other",
 "type": "enum",
 "name": "SearchResultType",
 "symbols": ["O", "S", "A"]
}

avro-maven-plugin配置:

<plugin>
    <groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.7.4</version>
    <executions>
    <execution>
        <phase>generate-sources</phase>
        <goals>
        <goal>schema</goal>
        </goals>
    <configuration>
     <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
         <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
     <includes>
         <include>**/*.avro</include>
     </includes>
     <imports>
              <import>${project.basedir}/src/main/resources/avro/user_action.avro</import>
              <import>${project.basedir}/src/main/resources/avro/search_result_type.avro</import>
     </imports>
       </configuration>
     </execution>
</executions>
</plugin>

答案 1 :(得分:20)

您还可以在一个文件中定义多个模式:

schemas.avsc:

[
{
    "type": "record",
    "name": "Bar",
    "fields": [ ]
},
{
    "type": "record",
    "name": "Foo",
    "fields": [
        {"name": "bar", "type": "Bar"}
    ]
}
]

如果你想在多个地方重用模式,这不是很好,但我认为它提高了可读性和可维护性。

答案 2 :(得分:5)

我认为,您的动机是(作为我自己的)构建您的架构定义并避免复制和粘贴错误。

为此,您还可以使用Avro IDL。它允许在更高级别定义avro架构。可以重用类型within the same file以及across multiple files

生成.avsc文件运行

$ java -jar avro-tools-1.7.7.jar idl2schemata my-protocol.avdl

生成的.avsc文件看起来与您的初始示例几乎相同,但是因为它们是从.avdl生成的,所以您不会以详细的json格式丢失。

答案 3 :(得分:1)

pom.xml中的导入顺序很重要。在处理剩余部分之前,必须先导入子类型。

<imports>
    <import>${project.basedir}/src/main/resources/avro/Bar.avro</import>
    <import>${project.basedir}/src/main/resources/avro/Foo.avro</import>
</imports>

这会阻止codegen发出undefined name: Bar.avro错误。

答案 4 :(得分:0)

从目前为止我能够弄清楚,没有。

关于编写自己的方法的人有一个很好的写作:

http://www.infoq.com/articles/ApacheAvro

答案 5 :(得分:0)

您需要将avsc文件导入avro-maven插件中,在该插件中,您首先编写了要重用的对象架构

$args