我的问题是在猪身上有一个内置功能来改组元组/包吗?
raw_record = LOAD '$inputPath' -- USING com.test.parser.TestParser;
record_project = FOREACH raw_record GENERATE
field1,
field2,
field3,
field4;
sl_record = FILTER record_project BY (field1=='1' OR field1=='2');
split sl_record into rec1 if field1=='1',rec2 if field1=='2';
rec2Sample = SAMPLE rec2 $samplingRate;
finalRec1 = FOREACH rec1 GENERATE
-1,
1,
field1,
field2,
field3,
field4;
finalRec2 = FOREACH rec2 GENERATE
1,
1,
field1,
field2,
field3,
field4;
unionRec = UNION finalRec1, finalRec2;
STORE unionRec INTO '$outputPath' USING PigStorage(',');
在上面的例子中,问题在于union,我看到所有的finalRec1后跟所有的finalRec2。我需要将其改组或混合。
我采取的解决方法是:
raw_record = LOAD '$inputPath' -- USING com.test.parser.TestParser;
record_project = FOREACH raw_record GENERATE
field1,
field2,
field3,
field4;
sl_record = FILTER record_project BY (field1=='1' OR field1=='2');
split sl_record into rec1 if field1=='1',rec2 if field1=='2';
rec2Sample = SAMPLE rec2 $samplingRate;
finalRec1 = FOREACH rec1 GENERATE
-1,
1,
field1,
field2,
field3,
field4,
(chararray)RANDOM() AS id;
finalRec2 = FOREACH rec2 GENERATE
1,
1,
field1,
field2,
field3,
field4,
(chararray)RANDOM() AS id;
unionRec = UNION finalRec1, finalRec2;
mixedRec = ORDER unionRec BY id ASC
STORE mixedRec INTO '$outputPath' USING PigStorage(',');
这样我就能混合它们,但现在我无法编写猪单元测试。有没有办法可以直接对unionRec进行洗牌,还可以编写猪单元测试?
测试:
@Test
public void myPigUnitTest {
String []inputs=new String[] {
"inputPath=/src/test/resource/testFile.txt",
"samplingRate=1",
"outputPath=dummy"
};
PigTest pigTest = PigUnitUtil.createPigTest("pathToMyPigFile",inputs);
String [] expectedUnion;
String [] expectedMixedRec;
pigTest.assertOutput("unionRec",expectedUnion);
pigTest.assertOutput("mixedRec",expectedMixedRec);
}
这里的问题是unionRec和mixedRec有随机数,混合的顺序也搞乱了。
答案 0 :(得分:2)
我设法想到自己的工作:
GDP | CPI | Interest rate
现在我验证了unionRec是否具有所需的所有数据。
答案 1 :(得分:1)
对元组进行了随机播放后,将列投影到CREATE OR REPLACE FUNCTION APPLY_SRID
(
GEOM IN OUT MDSYS.SDO_GEOMETRY
, SRID IN NUMBER DEFAULT 8307
) RETURN MDSYS.SDO_GEOMETRY AS
BEGIN
GEOM.SDO_SRID := SRID;
RETURN GEOM;
END APPLY_SRID;
并调用assertOutputAnyOrder:
将预期结果与脚本中生成的最后一个别名的结果进行比较。顺序无关紧要,只要结果位于预期的任何索引和输出的任何行中,那么这将通过。