创建方案的问题.avsc Avro

时间:2015-02-28 21:13:25

标签: json avro

我在制作avro计划时遇到了麻烦,下面我将提出我的计划。

twitter.avsc:

{
  "type" : "record",
  "name" : "twitter_schema",
  "namespace" : "com.miguno.avro",
  "fields" : [
    { "name" : "_id", "type" : "record", "doc" : "Values of the indexes/id tweets"},
    { "name" : "nome","type" : "string","doc" : "Name of the user account on Twitter.com" },
    { "name" : "tweet", "type" : "string","doc" : "The content of the user's Twitter message" },
    { "name" : "datahora", "type" : "string","doc" : "Unix epoch time in seconds"}

    ],
  "doc:" : "A schema for storing Twitter messages"
}

当我尝试将tweet.json转换为.avro时,我遇到以下错误:

Exception in thread "main" org.apache.avro.SchemaParseException: "record" is not a defined name. The type of the "_id" field must be a defined name or a {"type": ...} expression.
    at org.apache.avro.Schema.parse(Schema.java:1199)
    at org.apache.avro.Schema$Parser.parse(Schema.java:965)
    at org.apache.avro.Schema$Parser.parse(Schema.java:938)
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:82)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)

下面我把文件.json我试图转换为.avro。

tweet.json:

{ "_id" : { "$oid" : "54d148b471eb130b1e8b4567" }, "nome" : "Marco Correia", "tweet" : "Globo repassará R$ 300 milhões /clubes  http://t.co/SQIjscDolU Vão entrar 45 milhões /Flamengo nesse Mês e Março e o clube não tem Grana!Sei", "datahora" : "Tue Feb 03 22:15:54 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4568" }, "nome" : "FLUMINENSE F.C.", "tweet" : "Jornalheiros - Flamengo x Barra Mansa - Transmissão ao vivo (04/02/2015, 22:00, Maracanã) http://t.co/BYQk3swWqf", "datahora" : "Tue Feb 03 22:15:44 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4569" }, "nome" : "VaiRio - O Globo", "tweet" : "Praia do Flamengo tem fluxo bom no sentido Botafogo, na altura da Rua Dois de Dezembro http://t.co/lWe3IEvAp2", "datahora" : "Tue Feb 03 22:15:44 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456a" }, "nome" : "PC Filho ★★★★", "tweet" : "Jornalheiros - Flamengo x Barra Mansa - Transmissão ao vivo (04/02/2015, 22:00, Maracanã) http://t.co/NArNpqy3tz", "datahora" : "Tue Feb 03 22:15:43 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456b" }, "nome" : "ATL Sports Bar", "tweet" : "SCORE ALERT: #Basketball #Livescore @ScoresPro: (-NBB) #Flamengo Bc vs #Minas: 41-30", "datahora" : "Tue Feb 03 22:15:38 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456c" }, "nome" : "FlamengoNews", "tweet" : " Parcial dos quartos:\n1ºQ - @Flamengo 26x13 Minas\n2ºQ - Flamengo 15x17 Minas", "datahora" : "Tue Feb 03 22:15:33 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456d" }, "nome" : "VaiRio - O Globo", "tweet" : "Rua Mário Ribeiro com trânsito lento no sentido Lagoa, altura do C. R. Flamengo http://t.co/SzhrtTTMz1", "datahora" : "Tue Feb 03 22:15:33 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456e" }, "nome" : "carols", "tweet" : "RT @Flamengo: Esse dia foi LOUCO http://t.co/tEdwRX3bsN", "datahora" : "Tue Feb 03 22:15:30 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456f" }, "nome" : "walisson rodrigues ", "tweet" : "RT @Esp_Interativo: Alô, torcida do @Flamengo! O EI plus estará ABERTO na web para a transmissão do Jogando em Casa com Rodrigo Caetano! ht…", "datahora" : "Tue Feb 03 22:15:28 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4570" }, "nome" : "Adélio", "tweet" : "Flamengo: eu sou o fã número 520 #365Scores veio e torce por ele também! http://t.co/Fa4ToFWdMB", "datahora" : "Tue Feb 03 22:15:24 +0000 2015" }

1 个答案:

答案 0 :(得分:3)

类型应该是基元之一或用户定义的avro类型(记录 - 应首先定义然后再使用)。 avsc应该是以下之一:

{
"type": "record",
"name": "twitter_schema",
"namespace": "com.miguno.avro",
"fields": [
    {
        "name": "_id",
        "type": {
            "type": "record",
            "name": "id_schema",
            "namespace": "com.miguno.avro",
            "fields": [
                {
                    "name": "id_name",
                    "type": "string",
                    "doc": "Value of the indexes/id name tweets"
                },
                {
                    "name": "id_value",
                    "type": "string",
                    "doc": "Value of the indexes/id value tweets"
                }
            ],
            "doc:": "A schema for storing Values of the indexes/id tweets"
        },
        "doc": "Values of the indexes/id tweets"
    },
    {
        "name": "nome",
        "type": "string",
        "doc": "Name of the user account on Twitter.com"
    },
    {
        "name": "tweet",
        "type": "string",
        "doc": "The content of the user's Twitter message"
    },
    {
        "name": "datahora",
        "type": "string",
        "doc": "Unix epoch time in seconds"
    }
],
"doc:": "A schema for storing Twitter messages"
}

{
"type": "record",
"name": "twitter_schema",
"namespace": "com.miguno.avro",
"fields": [
    {
        "name": "_id",
        "type": {
            "type": "array",
            "items": "string"
        },
        "doc": "Values of the indexes/id tweets"
    },
    {
        "name": "nome",
        "type": "string",
        "doc": "Name of the user account on Twitter.com"
    },
    {
        "name": "tweet",
        "type": "string",
        "doc": "The content of the user's Twitter message"
    },
    {
        "name": "datahora",
        "type": "string",
        "doc": "Unix epoch time in seconds"
    }
],
"doc:": "A schema for storing Twitter messages"
}