Question

我有一个猪输入文件，如下所示：

alert.getText()

我需要过滤掉小于12的玉米片，我需要使用原始数据集进行下一步过滤。

<div class="order-number">
<h3>"Order number: "<strong>123-123123</strong> </h3>
</div>

现在我需要在filter1之后使用原始数据集进行下一步过滤。

Answer 1

运行命令时

filter1 = FILTER total BY item == 'cornflakes' AND price < 12;

它不会改变原始关系，总数。相反，它创建了一个新的关系 - filter1。现在，你有两个合作关系。您可以在程序中的任何位置访问总计。例如：

total = LOAD 'location_of_file' ...   -- total relation is created
filter1 = FILTER total BY item == 'cornflakes' AND price < 12; -- filter1 is created
...
filter2 = filter total by ... -- filter2 is created
...

/* Now count rows of original total (total is unchanged) */
grouped = group total by all;
total_row_count = foreach grouped generate COUNT(total) as cnt;

Answer 2

使用SPLIT

total = LOAD '/output/systemhawk/file_inventory/test34.txt' USING PigStorage(',') AS (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT total INTO filter1 IF (item == 'cornflakes' AND price < 12),filter2 OTHERWISE;
DUMP filter2;

Answer 3

为什么不使用SPLIT？

total = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT total into F1_total IF (your considtion), f2_total if (your conditions);

此后，您可以将过滤集用作f1_total，并将其作为f2_total使用。根据您的需求应用条件

猪过滤器和获取原始数据集

3 个答案: