我有一个包含两列的表(代码:chararray,sp:double)
我想将第二个字段sp划分为不同的组(例如,基于诸如(< 25),(> 25< 45),(> = 45)的条件。
INPUT
code sp
t001 60.0
t001 75.0
a003 34.0
t001 60.0
a003 23.0
a003 23.0
t001 45.0
t001 10.0
t001 8.0
a003 20.0
t001 38.0
a003 55.0
a003 50.0
t001 08.0
a003 44.0
所需的输出:
code bin1 bin2 bin3
(<25) (>25 <45) >=45
t001 3 1 4
a003 3 2 2
我正在尝试下面的脚本:
data = LOAD 'Sandy/rd.csv' using PigStorage(',') As (code:chararray,sp:double);
data2 = DISTINCT data;
selfiltnew = FOREACH data2 generate code, sp;
group_new = GROUP selfiltnew by (code,sp);
newselt = FOREACH group_new GENERATE selfiltnew.code AS code,selfiltnew.sp AS sp;
bin1 = filter newselt by sp < 25.0;
grp1 = FOREACH bin1 GENERATE newselt.code AS code, COUNT(newselt.sp) AS (sp1:double);
bin2 = filter newselt by sp < 45 and sp >= 25;
grp2 = FOREACH bin3 GENERATE newselt.code AS code, COUNT(newselt.sp) AS (sp2:double);
bin3 = filter newselt by sp >=75;
grp3 = FOREACH bin3 GENERATE newselt.code AS code, COUNT(newselt.sp) AS (sp3:double);
newbin = JOIN grp1 by code,grp2 by code,grp3 by code;
newtable = FOREACH newbin GENERATE grp1::group.code AS code, SUM(sp1) AS bin1,SUM(sp2) AS bin2,SUM(sp3) AS bin3;
data2 = FOREACH newtable GENERATE code, bin1, bin2, bin3;
dump newtable;
如何使用猪拉丁获得正确的输出?
答案 0 :(得分:0)
在使用COUNT
之前,您必须使用GROUP BY 来源: COUNT
用法
使用COUNT函数计算包中元素的数量。 COUNT需要一个前面的GROUP ALL语句用于全局计数,一个GROUP BY语句用于组计数。
bin1 = filter newselt by sp < 25.0;
grouped1 = GROUP bin1 by (newselt.code);
grp1 = FOREACH grouped1 GENERATE group AS code, COUNT(newselt.sp) AS (sp1:double);
答案 1 :(得分:0)
通过查看所需的输出,不需要"repositories": [
{
"type":"package",
"package": {
"name": "repo-name/yii2template",
"version":"master",
"source": {
"url": "https://your-git-server.com/repo-name/yii2template.git",
"type": "git",
"reference":"master"
}
}
}
],
。此外,您无需执行您正在遵循的某些步骤。请注意,如果来源以空格分隔,则应使用DISTINCT
代替PigStorage(' ')
按照@inquisitive_mind的指示,代码如下:
PigStorage(',')
这是输出: