我有以下表格的汇总查询
db.mycollection.aggregate([
{
"$match":
{
"Time": { $gte: ISODate("2016-01-30T00:00:00.000+0000") }
}
},
{
"$group":
{
"_id":
{
"day": { "$dayOfYear": "$Time" },
"hour": { "$hour": "$Time" }
},
"Dishes": { "$addToSet": "$Dish" }
}
},
{
"$group":
{
"_id": "$_id.hour",
"Food":
{
"$push":
{
"Day": "$_id.day",
"NumberOfDishes": { "$size":"$Dishes" }
}
}
}
},
{
"$project":
{
"Hour": "$_id",
"Food": "$Food",
"_id" : 0
}
},
{
"$sort": { "Hour": 1 }
}
]);
而不是在一小时的持续时间内如上所述,例如0-1,1-2,2-3,3-4,4-5,...,23-24,我希望能够在两小时的时间内完成这项工作。例如0-2,2-4,4-6,...,22-24。有没有办法做到这一点?
答案 0 :(得分:4)
提示:在arithmetic aggregation operators
中使用$project
让我们说ClientValidationFunction="JSValidateFunctionName"
,其中import org.apache.spark.sql.funtions._
odl_df.join(new_df, "src")
.withColumn("finalRank",
when(new_df("rank").isNull, odl_df("rank"))
.otherwise(new_df("rank"))
.drop(new_df("rank"))
.drop(odl_df("rank"))
.withColumnRenamed("finalRank", "rank")
是文件日期的实际小时数。然后,您可以H=floor(hour/2)
和$floor
运营商应用hour
H
此处"H": { $floor: { $divide: [ { "$hour": "$Time" }, 2 ] } }
对应于一对小时(H
,Hours=[0,2) => H=0
,Hours=[2,4) => H=1
等),您可以将其传递到$divide
阶段与
Hours=[22,24) => H=11
然后,您可以使用
输出特定$group: { "_id": { "day": { $dayOfYear: "$Time" }, "H": "$H" } }
的小时数
H
鉴于文件集合
"Hours": [ { $multiply: [ "$H", 2 ] }, { $sum: [ { $multiply: [ "$H", 2 ] }, 2 ] } ]
并使用下一个聚合
{ "Time" : ISODate("2016-01-30T01:00:00Z"), "Dish" : "dish1" }
{ "Time" : ISODate("2016-01-30T02:00:00Z"), "Dish" : "dish2" }
{ "Time" : ISODate("2016-01-30T03:00:00Z"), "Dish" : "dish3" }
{ "Time" : ISODate("2016-01-30T04:00:00Z"), "Dish" : "dish4" }
{ "Time" : ISODate("2016-01-30T05:00:00Z"), "Dish" : "dish5" }
{ "Time" : ISODate("2016-01-30T06:00:00Z"), "Dish" : "dish6" }
{ "Time" : ISODate("2016-01-30T07:00:00Z"), "Dish" : "dish7" }
{ "Time" : ISODate("2016-01-30T08:00:00Z"), "Dish" : "dish8" }
{ "Time" : ISODate("2016-01-30T09:00:00Z"), "Dish" : "dish9" }
提供结果
db.mycollection.aggregate([
{
"$match":
{
"Time": { $gte: ISODate("2016-01-30T00:00:00.000+0000") }
}
},
{
"$project":
{
"Dish": 1,
"Time": 1,
"H": { $floor: { $divide: [ { $hour: "$Time" }, 2 ] } }
}
},
{
"$group":
{
"_id":
{
"day": { $dayOfYear: "$Time" },
"H": "$H"
},
"Dishes": { $addToSet: "$Dish" }
}
},
{
"$group":
{
"_id": "$_id.H",
"Food":
{
"$push":
{
"Day": "$_id.day",
"NumberOfDishes": { $size: "$Dishes" }
}
}
}
},
{
"$sort": { "_id": 1 }
},
{
"$project":
{
"Hours": [ { $multiply: [ "$_id", 2 ] }, { $sum: [ { $multiply: [ "$_id", 2 ] }, 2 ] } ],
"Food": "$Food",
"_id": 0
}
}
]);