假设我有一个类似下面的架构
{
"name": "phoneNumber",
"type": {
"type": "record",
"name": "internalNumber",
"namespace": "com.wiki",
"fields": [{
"name": "areacode",
"type": "string",
}, {
"name": "phone",
"type": ["null", "string"],
"doc": "Acutal full number",
"default": null
}]
}
}
我有一个csv将这些数据分散到多个列中,如:
areaCode phoneNumber
+1 1234512345
我如何从猪脚本中获取如下的avro文件:
"phoneNumber" : {
"areacode" : "+1",
"phone" : "1234512345"
}
自嵌套。
答案 0 :(得分:0)
A = LOAD 'path' USING CSVLoader as (areaCode: chararray, phoneNumber: chararray);
B = foreach A generate (areaCode, phoneNumber as phone) as phoneNumber;
STORE B INTO 'path' using AvroStorage;
你需要来自皮卡的csvloader和avrostorage