我是使用Ruby的Avro的新手,基本上是编程。
当我使用ruby在Avro上执行一些基本操作时,我发现架构存在一些问题。 以下是代码。
require 'rubygems'
require 'avro'
require 'mysql2'
require 'json'
require 'multi_json'
# setup mysql
db = Mysql2::Client.new(:host => "localhost", :username =>"root",:password=> "root", :database => 'world')
file = File.open('C:/Avro-Spark-Inputs/Serialized_Avro/country_join.avro', 'wb')
schema = Avro::Schema.parse(File.open("C:/Avro-Spark-Inputs/Schema/country.avsc", "rb").read)
writer = Avro::IO::DatumWriter.new(schema)
dw = Avro::DataFile::Writer.new(file, writer, schema)
results = db.query("SELECT * FROM country")
results.each do |row|
dw << row
end
# close the avro data file
dw1.close
puts "Avro File Created Succesfully"
以下是已定义的架构。
{
"type" : "record",
"namespace" : "country.avro",
"name" : "country",
"fields" : [
{"name": "Code", "type": "string"},
{"name": "Name", "type": "string"},
{"name": "Continent", "type": {"name": "Continent", "type":"enum", "symbols": ["Asia", "Europe", "North America", "Africa", "Oceania", "Antarctica", "South America"]}},
{"name": "Region", "type": "string"},
{"name": "SurfaceArea", "type": "float"},
{"name": "IndepYear", "type": "int"},
{"name": "Population", "type": "int"},
{"name": "LifeExpectancy", "type": "float"},
{"name": "GNP", "type": "float"},
{"name": "GNPOld", "type": "float"},
{"name": "LocalName", "type": "string"},
{"name": "GovernmentForm", "type": "string"},
{"name": "HeadOfState", "type": "string"},
{"name": "Capital", "type": "int"},
{"name": "Code2", "type": "string"}
]
}
观察到错误:
C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:547:in `write_data': The datum {"Code"=>"ABW", "Name"=>"Aruba", "Continent"=>"North America", "Region"=>"Caribbean", "SurfaceArea"=>193.0, "IndepYear"=>nil, "Population"=>103000, "LifeExpectancy"=>78.4, "GNP"=>828.0, "GNPOld"=>793.0, "LocalName"=>"Aruba", "GovernmentForm"=>"Nonmetropolitan Territory of The Netherlands", "HeadOfState"=>"Beatrix", "Capital"=>129, "Code2"=>"AW"} is not an example of schema {"type":"record","name":"country","namespace":"country.avro","fields":[{"name":"Code","type":"string"},{"name":"Name","type":"string"},{"name":"Continent","type":{"type":"enum","name":"Continent","namespace":"country.avro","symbols":["Asia","Europe","North America","Africa","Oceania","Antarctica","South America"]}},{"name":"Region","type":"string"},{"name":"SurfaceArea","type":"float"},{"name":"IndepYear","type":"int"},{"name":"Population","type":"int"},{"name":"LifeExpectancy","type":"float"},{"name":"GNP","type":"float"},{"name":"GNPOld","type":"float"},{"name":"LocalName","type":"string"},{"name":"GovernmentForm","type":"string"},{"name":"HeadOfState","type":"string"},{"name":"Capital","type":"int"},{"name":"Code2","type":"string"}]} (Avro::IO::AvroTypeError)
from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:542:in `write'
from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/data_file.rb:136:in `<<'
from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:26:in `block in <main>'
from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `each'
from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `<main>'
我被困在这里,无法在网上找到任何答案(必须要问一个愚蠢的问题)。
答案 0 :(得分:0)
我发现了这个问题!
属性&#34; IndepYear&#34;数据库中有空值,我错过了在Avro Schema中提到的与上述错误相同的内容。