我有一个猪输入文件,如下所示:
alert.getText()
我需要过滤掉小于12的玉米片,我需要使用原始数据集进行下一步过滤。
<div class="order-number">
<h3>"Order number: "<strong>123-123123</strong> </h3>
</div>
现在我需要在filter1之后使用原始数据集进行下一步过滤。
答案 0 :(得分:0)
运行命令时
filter1 = FILTER total BY item == 'cornflakes' AND price < 12;
它不会改变原始关系,总数。相反,它创建了一个新的关系 - filter1。现在,你有两个合作关系。您可以在程序中的任何位置访问总计。例如:
total = LOAD 'location_of_file' ... -- total relation is created
filter1 = FILTER total BY item == 'cornflakes' AND price < 12; -- filter1 is created
...
filter2 = filter total by ... -- filter2 is created
...
/* Now count rows of original total (total is unchanged) */
grouped = group total by all;
total_row_count = foreach grouped generate COUNT(total) as cnt;
答案 1 :(得分:0)
使用SPLIT
total = LOAD '/output/systemhawk/file_inventory/test34.txt' USING PigStorage(',') AS (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT total INTO filter1 IF (item == 'cornflakes' AND price < 12),filter2 OTHERWISE;
DUMP filter2;
答案 2 :(得分:0)
为什么不使用SPLIT?
total = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT total into F1_total IF (your considtion), f2_total if (your conditions);
此后,您可以将过滤集用作f1_total,并将其作为f2_total使用。根据您的需求应用条件