如何解决该数据不是模式的示例(Avro :: IO :: AvroTypeError)

时间:2018-02-20 15:07:41

标签: json ruby enums avro

我是使用Ruby的Avro的新手,基本上是编程。

当我使用ruby在Avro上执行一些基本操作时,我发现架构存在一些问题。 以下是代码。

require 'rubygems'
require 'avro'  
require 'mysql2' 
require 'json' 
require 'multi_json'

# setup mysql
db = Mysql2::Client.new(:host => "localhost", :username =>"root",:password=> "root", :database => 'world')

file = File.open('C:/Avro-Spark-Inputs/Serialized_Avro/country_join.avro', 'wb')  

schema = Avro::Schema.parse(File.open("C:/Avro-Spark-Inputs/Schema/country.avsc", "rb").read)

writer = Avro::IO::DatumWriter.new(schema) 

dw = Avro::DataFile::Writer.new(file, writer, schema) 

results = db.query("SELECT * FROM country")

results.each do |row| 
dw << row
end

# close the avro data file
dw1.close   

puts "Avro File Created Succesfully"

以下是已定义的架构。

   {
 "type" : "record",
 "namespace" : "country.avro",
 "name" : "country",
 "fields" : [
          {"name": "Code", "type": "string"},
          {"name": "Name", "type": "string"},
          {"name": "Continent", "type": {"name": "Continent", "type":"enum", "symbols": ["Asia", "Europe", "North America", "Africa", "Oceania", "Antarctica", "South America"]}},
          {"name": "Region", "type": "string"},
          {"name": "SurfaceArea", "type": "float"},
          {"name": "IndepYear", "type": "int"},
          {"name": "Population", "type": "int"},
          {"name": "LifeExpectancy", "type": "float"},
          {"name": "GNP", "type": "float"},
          {"name": "GNPOld", "type": "float"},
          {"name": "LocalName", "type": "string"},
          {"name": "GovernmentForm", "type": "string"},
          {"name": "HeadOfState", "type": "string"},
          {"name": "Capital", "type": "int"},
          {"name": "Code2", "type": "string"}
        ]
}

观察到错误:

C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:547:in `write_data': The datum {"Code"=>"ABW", "Name"=>"Aruba", "Continent"=>"North America", "Region"=>"Caribbean", "SurfaceArea"=>193.0, "IndepYear"=>nil, "Population"=>103000, "LifeExpectancy"=>78.4, "GNP"=>828.0, "GNPOld"=>793.0, "LocalName"=>"Aruba", "GovernmentForm"=>"Nonmetropolitan Territory of The Netherlands", "HeadOfState"=>"Beatrix", "Capital"=>129, "Code2"=>"AW"} is not an example of schema {"type":"record","name":"country","namespace":"country.avro","fields":[{"name":"Code","type":"string"},{"name":"Name","type":"string"},{"name":"Continent","type":{"type":"enum","name":"Continent","namespace":"country.avro","symbols":["Asia","Europe","North America","Africa","Oceania","Antarctica","South America"]}},{"name":"Region","type":"string"},{"name":"SurfaceArea","type":"float"},{"name":"IndepYear","type":"int"},{"name":"Population","type":"int"},{"name":"LifeExpectancy","type":"float"},{"name":"GNP","type":"float"},{"name":"GNPOld","type":"float"},{"name":"LocalName","type":"string"},{"name":"GovernmentForm","type":"string"},{"name":"HeadOfState","type":"string"},{"name":"Capital","type":"int"},{"name":"Code2","type":"string"}]} (Avro::IO::AvroTypeError)
    from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:542:in `write'
    from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/data_file.rb:136:in `<<'
    from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:26:in `block in <main>'
    from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `each'
    from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `<main>'

我被困在这里,无法在网上找到任何答案(必须要问一个愚蠢的问题)。

1 个答案:

答案 0 :(得分:0)

我发现了这个问题!

属性&#34; IndepYear&#34;数据库中有空值,我错过了在Avro Schema中提到的与上述错误相同的内容。