新猪用户。我将mysql语句转换为pig并遇到了以下问题。我有两个表需要加入,连接的值有一个计算。我认为它一定是简单的问题。
例如,我的表格是我加入的 machine1 和 machinemeans 。在Pig
手册中找不到用于在连接中进行计算的语法。有什么建议吗?
select region, os, group, f.machine, f.machine_users, f.machine_tm,
f.machine_users - g.users_per_machine outliers,
f.machine_tm - g.tm_per_machine outlying_tm,
tm_per_machine/(f.machine_tm+1) factor
from machine1 f
inner join machinemeans g using(region, os, group)
order by 4, 1, 2, 3
Thx
更新:谢谢,WinnieNicklaus。 我尝试了你的建议,但我得到一个标量输出错误超过1行。这是我的代码。
machine1 = LOAD 'S1' AS (
block:chararray,
region:chararray,
os:chararray,
group:int,
machine:int,
machine_users:int,
machine_tm:float
);
machinemeans = LOAD 'S2' AS (
region:chararray,
os:chararray,
group:int,
tot_machines:int,
tot_users:int,
users_per_machine:float,
tm_per_machine:float,
tm_per_user:float,
cnt_per_block:float,
cnt_per_user:float
);
imbalance = FOREACH (JOIN machine1 by (region,os,group),
machine2 by (region,os,group))
GENERATE
region,os,group,
machine1.machine,
machine1.machine_users,
machine1.machine_tm,
machine1.machine_users - machinemeans.users_per_machine,
machine1.machine_tm - machinemeans.tm_per_machine;
答案 0 :(得分:0)
单个SQL查询可能需要多个Pig Latin语句。您引用的计算不完全在SQL join
中;它实际上在select
语句中,而SQL中的select ... from ...
基本上对应于Pig的FOREACH ... GENERATE ...
。因此FOREACH
的结果是JOIN
。例如:
result =
FOREACH (
JOIN table1 BY key1, table2 BY key2
) GENERATE
table1.field1,
table1.field2,
table2.field3,
table1.field4 - table2.field5;
如果您需要进行计算以获取连接键,但以后不关心它们,您甚至可以
JOIN table1 BY (field1+field4), table2 BY myUDF(field3);