我在Spark和Scala中有以下DataFrame:
group nodeId date
1 1 2016-10-12T12:10:00.000Z
1 2 2016-10-12T12:00:00.000Z
1 3 2016-10-12T12:05:00.000Z
2 1 2016-10-12T12:30:00.000Z
2 2 2016-10-12T12:35:00.000Z
我需要按group
对记录进行分组,按date
按升序对其进行排序,并生成顺序nodeId
对。此外,date
应转换为Unix纪元。
使用预期输出可以更好地解释这一点:
group nodeId_1 nodeId_2 date
1 2 3 2016-10-12T12:00:00.000Z
1 3 1 2016-10-12T12:05:00.000Z
2 1 2 2016-10-12T12:30:00.000Z
这是我到目前为止所做的:
df
.groupBy("group")
.agg($"nodeId",$"date")
.orderBy(asc("date"))
但我不知道如何创建nodeId
对。
答案 0 :(得分:1)
使用Window
函数和lead
内置函数创建对,以及to_utc_timestamp
内置函数将日期转换为纪元日期,您可以从中受益。最后,您必须filter
未配对的行,因为您在输出中不需要它们。
以下是上述解释的程序。为清晰起见,我使用了评论
import org.apache.spark.sql.expressions._
def windowSpec = Window.partitionBy("group").orderBy("date") //defining window function grouping by group and ordering by date
import org.apache.spark.sql.functions._
df.withColumn("date", to_utc_timestamp(col("date"), "Asia/Kathmandu")) //converting the date to epoch datetime you can choose other timezone as required
.withColumn("nodeId_2", lead("nodeId", 1).over(windowSpec)) //using window for creating pairs
.filter(col("nodeId_2").isNotNull) //filtering out the unpaired rows
.select(col("group"), col("nodeId").as("nodeId_1"), col("nodeId_2"), col("date")) //selecting as required final dataframe
.show(false)
您应该根据需要获得最终dataframe
+-----+--------+--------+-------------------+
|group|nodeId_1|nodeId_2|date |
+-----+--------+--------+-------------------+
|1 |2 |3 |2016-10-12 12:00:00|
|1 |3 |1 |2016-10-12 12:05:00|
|2 |1 |2 |2016-10-12 12:30:00|
+-----+--------+--------+-------------------+
我希望答案很有帮助
注意 以获取我使用Asia/Kathmandu
作为时区的正确纪元日期。
答案 1 :(得分:0)
如果我理解您的要求,您可以group
使用<
上的自我加入和nodeId
上的val df = Seq(
(1, 1, "2016-10-12T12:10:00.000Z"),
(1, 2, "2016-10-12T12:00:00.000Z"),
(1, 3, "2016-10-12T12:05:00.000Z"),
(2, 1, "2016-10-12T12:30:00.000Z"),
(2, 2, "2016-10-12T12:35:00.000Z")
).toDF("group", "nodeId", "date")
df.as("df1").join(
df.as("df2"),
$"df1.group" === $"df2.group" && $"df1.nodeId" < $"df2.nodeId"
).select(
$"df1.group", $"df1.nodeId", $"df2.nodeId",
when($"df1.date" < $"df2.date", $"df1.date").otherwise($"df2.date").as("date")
)
// +-----+------+------+------------------------+
// |group|nodeId|nodeId|date |
// +-----+------+------+------------------------+
// |1 |1 |3 |2016-10-12T12:05:00.000Z|
// |1 |1 |2 |2016-10-12T12:00:00.000Z|
// |1 |2 |3 |2016-10-12T12:00:00.000Z|
// |2 |1 |2 |2016-10-12T12:30:00.000Z|
// +-----+------+------+------------------------+
不等式条件:
showRewardedAds() {
const rewardedConfig: AdMobFreeRewardVideoConfig = {
id: "ID goes here...",
isTesting: false
}
this.adMobFree.rewardVideo.config(rewardedConfig);
this.adMobFree.rewardVideo.prepare().then((data:any)=>{
// HERE YOU WILL NEED TO MAKE THE SAME THING
// catching the Error or the Success of the Promise
this.adMobFree.rewardVideo.show()
})
.catch((e:Error)=>{
console.log("Error ",e);
});
}