Apache PIG:我有两个数据集X,Y具有相同的模式。 X有222个no.of.records,Y有70个记录。 我需要结合两个,意味着垂直添加两个数据集。如果我使用UNION,则输出的问题超过了预期的记录数。 那是在Z = UNION X,Y;有888条记录。 任何人都可以给出建议。
示例代码:
UI_INFO = filter LOGS BY hotel_book.step == 'UI_INFO';
LOGS_ITINERARY = filter LOGS BY hotel_book.step == 'CREATE_ITINERARY';
LOGS_JOINED = join UI_INFO by header.itinerary_id LEFT OUTER , LOGS_ITINERARY by header.itinerary_id;
LOGS_BOOK_COL1 = FOREACH LOGS_JOINED {
CURRENCY = LOGS_ITINERARY::hotel_book.itinerary.hotel.rooms.currency;
GENERATE UI_INFO::header.date_time AS date_time,
LOGS_ITINERARY::hotel_book.pay_at_hotel AS pay_at_hotel,
UI_INFO::header.referrer AS referrer,
UI_INFO::hotel_book.step AS stage,
FLATTEN( (IsEmpty(CURRENCY) ? TOBAG('unknown') : CURRENCY) ) AS currency;
};
REMAINING_LOGS = FILTER LOGS BY (hotel_book.step == 'CREATE_ITINERARY' OR hotel_book.step == 'PROVISIONAL_BOOK')
LOGS_BOOK_COL2 = FOREACH REMAINING_LOGS {
CURRENCY = hotel_book.itinerary.hotel.rooms.currency;
GENERATE header.date_time AS date_time,
hotel_book.pay_at_hotel AS pay_at_hotel,
header.referrer AS referrer,
hotel_book.step AS stage,
FLATTEN( (IsEmpty(CURRENCY) ? TOBAG('unknown') : CURRENCY) ) AS currency;
};
LOGS_BOOK_COL = UNION LOGS_BOOK_COL1,LOGS_BOOK_COL2;