当我尝试将file.txt加载到Pig时,我收到以下错误:
pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[\|-\|]'
该文件的示例行是:
文本| - |文本| - |文本
我使用以下命令:
bag = LOAD 'file.txt' USING PigStorage('\\|-\\|') AS (v1:chararray, v2:chararray, v3:chararray);
是分隔符吗?我的正则表达式?
答案 0 :(得分:2)
如果您不想编写自定义LOAD功能,则可以使用' - '加载您的记录。作为分隔符,然后添加另一个步骤来替换所有' |'在你的领域。
bag = LOAD 'file.txt' USING PigStorage('-') AS (v1:chararray, v2:chararray, v3:chararray);
bag_new = FOREACH bag GENERATE
REPLACE(v1,'|','') as v1_new,
REPLACE(v2,'|','') as v2_new,
REPLACE(v3,'|','') as v3_new;