我在制作avro计划时遇到了麻烦,下面我将提出我的计划。
twitter.avsc:
{
"type" : "record",
"name" : "twitter_schema",
"namespace" : "com.miguno.avro",
"fields" : [
{ "name" : "_id", "type" : "record", "doc" : "Values of the indexes/id tweets"},
{ "name" : "nome","type" : "string","doc" : "Name of the user account on Twitter.com" },
{ "name" : "tweet", "type" : "string","doc" : "The content of the user's Twitter message" },
{ "name" : "datahora", "type" : "string","doc" : "Unix epoch time in seconds"}
],
"doc:" : "A schema for storing Twitter messages"
}
当我尝试将tweet.json转换为.avro时,我遇到以下错误:
Exception in thread "main" org.apache.avro.SchemaParseException: "record" is not a defined name. The type of the "_id" field must be a defined name or a {"type": ...} expression.
at org.apache.avro.Schema.parse(Schema.java:1199)
at org.apache.avro.Schema$Parser.parse(Schema.java:965)
at org.apache.avro.Schema$Parser.parse(Schema.java:938)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:82)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
下面我把文件.json我试图转换为.avro。
tweet.json:
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4567" }, "nome" : "Marco Correia", "tweet" : "Globo repassará R$ 300 milhões /clubes http://t.co/SQIjscDolU Vão entrar 45 milhões /Flamengo nesse Mês e Março e o clube não tem Grana!Sei", "datahora" : "Tue Feb 03 22:15:54 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4568" }, "nome" : "FLUMINENSE F.C.", "tweet" : "Jornalheiros - Flamengo x Barra Mansa - Transmissão ao vivo (04/02/2015, 22:00, Maracanã) http://t.co/BYQk3swWqf", "datahora" : "Tue Feb 03 22:15:44 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4569" }, "nome" : "VaiRio - O Globo", "tweet" : "Praia do Flamengo tem fluxo bom no sentido Botafogo, na altura da Rua Dois de Dezembro http://t.co/lWe3IEvAp2", "datahora" : "Tue Feb 03 22:15:44 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456a" }, "nome" : "PC Filho ★★★★", "tweet" : "Jornalheiros - Flamengo x Barra Mansa - Transmissão ao vivo (04/02/2015, 22:00, Maracanã) http://t.co/NArNpqy3tz", "datahora" : "Tue Feb 03 22:15:43 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456b" }, "nome" : "ATL Sports Bar", "tweet" : "SCORE ALERT: #Basketball #Livescore @ScoresPro: (-NBB) #Flamengo Bc vs #Minas: 41-30", "datahora" : "Tue Feb 03 22:15:38 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456c" }, "nome" : "FlamengoNews", "tweet" : " Parcial dos quartos:\n1ºQ - @Flamengo 26x13 Minas\n2ºQ - Flamengo 15x17 Minas", "datahora" : "Tue Feb 03 22:15:33 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456d" }, "nome" : "VaiRio - O Globo", "tweet" : "Rua Mário Ribeiro com trânsito lento no sentido Lagoa, altura do C. R. Flamengo http://t.co/SzhrtTTMz1", "datahora" : "Tue Feb 03 22:15:33 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456e" }, "nome" : "carols", "tweet" : "RT @Flamengo: Esse dia foi LOUCO http://t.co/tEdwRX3bsN", "datahora" : "Tue Feb 03 22:15:30 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456f" }, "nome" : "walisson rodrigues ", "tweet" : "RT @Esp_Interativo: Alô, torcida do @Flamengo! O EI plus estará ABERTO na web para a transmissão do Jogando em Casa com Rodrigo Caetano! ht…", "datahora" : "Tue Feb 03 22:15:28 +0000 2015" }
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4570" }, "nome" : "Adélio", "tweet" : "Flamengo: eu sou o fã número 520 #365Scores veio e torce por ele também! http://t.co/Fa4ToFWdMB", "datahora" : "Tue Feb 03 22:15:24 +0000 2015" }
答案 0 :(得分:3)
类型应该是基元之一或用户定义的avro类型(记录 - 应首先定义然后再使用)。 avsc应该是以下之一:
{
"type": "record",
"name": "twitter_schema",
"namespace": "com.miguno.avro",
"fields": [
{
"name": "_id",
"type": {
"type": "record",
"name": "id_schema",
"namespace": "com.miguno.avro",
"fields": [
{
"name": "id_name",
"type": "string",
"doc": "Value of the indexes/id name tweets"
},
{
"name": "id_value",
"type": "string",
"doc": "Value of the indexes/id value tweets"
}
],
"doc:": "A schema for storing Values of the indexes/id tweets"
},
"doc": "Values of the indexes/id tweets"
},
{
"name": "nome",
"type": "string",
"doc": "Name of the user account on Twitter.com"
},
{
"name": "tweet",
"type": "string",
"doc": "The content of the user's Twitter message"
},
{
"name": "datahora",
"type": "string",
"doc": "Unix epoch time in seconds"
}
],
"doc:": "A schema for storing Twitter messages"
}
或
{
"type": "record",
"name": "twitter_schema",
"namespace": "com.miguno.avro",
"fields": [
{
"name": "_id",
"type": {
"type": "array",
"items": "string"
},
"doc": "Values of the indexes/id tweets"
},
{
"name": "nome",
"type": "string",
"doc": "Name of the user account on Twitter.com"
},
{
"name": "tweet",
"type": "string",
"doc": "The content of the user's Twitter message"
},
{
"name": "datahora",
"type": "string",
"doc": "Unix epoch time in seconds"
}
],
"doc:": "A schema for storing Twitter messages"
}