Pig:用于将单个文件中的记录类型写入多个输出

时间:2017-08-04 19:08:37

标签: hadoop hive apache-pig

我在单个文件中有以下数据

"HD",003498,"20160913:17:04:10","D3ZYE",1
"EH","XXX-1985977-1",1,"01","20151215","20151215","20151229","20151215","2304",,,"36-126481000",1340.74,61808.00,1126.62,0.00,214.12,0.00,0.00,0.00,"30","20151229","00653845",,,"PARTS","001","ABI","20151215","Y","Y","N","36-126481000",

我想使用Pig来读取这个单个文件,然后根据第一列将其隔离到不同的文件中 同样,我一直在寻找一种方法来首先将记录视为以下结构:

recTypCd,recordData

然后稍后将recordData视为CSV记录

在这方面,我将它们存储在具有相同记录类型的单独文件后,我可以使用CSV serde将它们加载到自己的外部HIVE表中

1 个答案:

答案 0 :(得分:0)

您可以根据您的情况在猪中使用拆分

e.g multiple = recTypeCd的分割线 当rectypecd =='hd'时的情况hd1, 案例hd2 ......

将hd1存储到op1; 将hd2存储到op2;