我有如下所示的rdd,并希望将每个用户在每年15天的间隔内所花费的金额相加,我有这样的数据 这里0表示15天之内不花钱1表示15天之内不花钱
user year 15_days amount
vimal 2013 0 10
vimal 2013 1 15
vimal 2013 1 12
vimal 2013 0 14
vimal 2014 1 10
vimal 2013 0 14
vimal 2014 1 10
vimal 2014 1 05
vimal 2014 0 05
vimal 2014 0 10
我尝试了以下代码,但没有得到我期望的结果
val data_new = data.select($"user", "year", $"15_days", $"amount").rdd.groupBy(x=>((x.getString(0), x.getInt(1))))
.map(
x=>{
val user = x.getString(0)
val yr = x.getInt(1)
val 15_days = x.getInt(2)
val amount = x.getFloat(3)
var amt_sum:Float = 0.0F
val itb = Iterator(x.getInt(2))
var no_times: Int = 0I
for(i <- x.indices)
if (15_days==1 && itb.next ==1)
{
amt_sum + = amount
}
else
{
amt_sum = amount
}
case (amt_sum )
{
if(amt_sum>=25)
no_times + = 1
}
else
{
no_times =0
}
}
(user,year,no_times)
)
user year 15_days amount amt_sum
vimal 2013 0 10 10
vimal 2013 1 15 15
vimal 2013 1 12 27
vimal 2013 0 14 14
vimal 2014 1 10 10
vimal 2013 0 14 14
vimal 2014 1 10 10
vimal 2014 1 25 35
vimal 2014 0 05 0
vimal 2014 0 10 0
预期的最终结果如下
vimal 2013 1
vimal 2014 1