嗨专家,
我有这个数据集:
using (readNext = command.ExecuteReader())
{
while (readNext.Read())
{
for (int x = 0; x < 5; x++) // Where I am iterating throught each column
{
var nextValue = readNext.GetValue(x);
// Code
}
}
}
我正在尝试计算每个Field_A的Count(*)并根据Field A和Date创建一个等级。基本上我想回复这个:
Field_A Field_B DATE
John 1 01-01-2016
John 1 05-01-2016
Cate 1 05-01-2016
Cate 4 01-01-2016
Cate 6 05-01-2016
Perdi 4 01-01-2016
为此,我正在尝试使用此代码:
Field_A Count Rank Field_B
John 2 1 1
John 2 2 1
Cate 3 3 1
Cate 3 4 4
Cate 3 3 6
Perdi 1 5 4
但我收到以下错误:
DATA = load '...'
AS
(Field_A:Int,
FIELD_B:Int,
DATE:CHARARRAY);
A = rank DATA BY Field_A;
B = GROUP A BY $0;
C = foreach B {
CNT = COUNT(A.Field_A);
generate $0, CNT;
}
D = join A by $0, C by $0;
E = rank D BY DATE,Field_A DENSE;
F = foreach E generate $0 AS RANK,Field_A,CNT;
DUMP F;
我该如何解决这个问题?
非常感谢!
答案 0 :(得分:1)
C = foreach B {
generate group as Field_A, COUNT(A) as CNT;
}
答案 1 :(得分:0)
将Field_A更改为CHARARRAY并使用&#39; \ t&#39;文件,我对许多陈述的解决方案没有留下深刻印象,但它确实有效,
A = LOAD '/user/root/datex.txt' USING PigStorage('\t') AS (Field_A:CHARARRAY, FIELD_B:Int,DATE:CHARARRAY);
B = FOREACH A GENERATE Field_A, FIELD_B, ToDate(DATE,'MM-dd-yyyy') as Datex;
D = GROUP B by Field_A;
E = FOREACH D GENERATE group , COUNT(B.Field_A) ;
F = join E by $0, B by Field_A;
G = FOREACH F GENERATE $0,$1,$3,$4
H = rank G by $0, $3 ;
Last = FOREACH H GENERATE $1 as FIELD_A, $2 as CNT, $0 as Rank , $3 AS FIELD_B;
DUMP