我有一个大约10G的JSON文件。每行只包含一个JSON文档。我想知道将此转换为Avro的最佳方法是什么。理想情况下,我希望每个文件保留几个文档(如10M)。我认为Avro支持在同一个文件中包含多个文档。
答案 0 :(得分:3)
答案 1 :(得分:0)
将大型JSON文件转换为Avro的最简单方法是使用Avro website中的avro-tools。
创建简单模式后,可以直接转换文件。
{
"type": "record",
"name": "cpc_schema",
"namespace": "com.streambright.avro",
"fields": [{
"name": "section",
"type": "string",
"doc": "Section of the CPC"
}, {
"name": "class",
"type": "string",
"doc": "Class of the CPC"
}, {
"name": "subclass",
"type": "string",
"doc": "Subclass of the CPC"
}, {
"name": "main_group",
"type": "string",
"doc": "Main-group of the CPC"
}, {
"name": "subgroup",
"type": "string",
"doc": "Subgroup of the CPC"
}, {
"name": "classification_value",
"type": "string",
"doc": "Classification value of the CPC"
}, {
"name": "doc_number",
"type": "string",
"doc": "Patent doc_number"
}, {
"name": "updated_at",
"type": "string",
"doc": "Document update time"
}],
"doc:": "A basic schema for CPC codes"
}
示例架构:
to move-up
let myelev [elevation] of patch-here
let higherpatches neighbors with [elevation > myelev]
if any? higherpatches
[move-to one-of higherpatches]
end