猪过滤器和获取原始数据集

时间:2017-04-07 13:40:27

标签: apache-pig

我有一个猪输入文件,如下所示:

alert.getText()

我需要过滤掉小于12的玉米片,我需要使用原始数据集进行下一步过滤。

<div class="order-number">
<h3>"Order number: "<strong>123-123123</strong> </h3>
</div>

现在我需要在filter1之后使用原始数据集进行下一步过滤。

3 个答案:

答案 0 :(得分:0)

运行命令时

filter1 = FILTER total BY item == 'cornflakes' AND price < 12;

它不会改变原始关系,总数。相反,它创建了一个新的关系 - filter1。现在,你有两个合作关系。您可以在程序中的任何位置访问总计。例如:

total = LOAD 'location_of_file' ...   -- total relation is created
filter1 = FILTER total BY item == 'cornflakes' AND price < 12; -- filter1 is created
...
filter2 = filter total by ... -- filter2 is created
...

/* Now count rows of original total (total is unchanged) */
grouped = group total by all;
total_row_count = foreach grouped generate COUNT(total) as cnt;

答案 1 :(得分:0)

使用SPLIT

total = LOAD '/output/systemhawk/file_inventory/test34.txt' USING PigStorage(',') AS (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT total INTO filter1 IF (item == 'cornflakes' AND price < 12),filter2 OTHERWISE;
DUMP filter2;

Output

答案 2 :(得分:0)

为什么不使用SPLIT

total = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT total into F1_total IF (your considtion), f2_total if (your conditions);

此后,您可以将过滤集用作f1_total,并将其作为f2_total使用。根据您的需求应用条件