我具有以下格式的输入数据集:
+---+--------+----------+
| id| refId| timestamp|
+---+--------+----------+
| 1| null|1548944642|
| 1|29950529|1548937685|
| 2|27510720|1548944885|
| 2|27510720|1548943617|
+---+--------+----------+
需要使用以下转换逻辑添加新列session
:
refId is null
,则会话值为true。id and refId are unique
,则会话值为true。id and refId are not unique
和`时间戳大于上一行,则会话值为true。时间戳之间的差异也应该> 60。+---+--------+-------+----------+
| id| refId|session| timestamp|
+---+--------+-------+----------+
| 1| null| true|1548944642|
| 1|29950529| true|1548937685|
| 2|27510720| false|1548943617|
| 2|27510720| true|1548944885|
+---+--------+-------+----------+
我能够分别处理1和3个条件,但不能处理第二个。
WindowSpec w = Window.partitionBy("id, refid").orderBy(timestampDS.col("timestamp"));
functions.coalesce(timestampDS.col("timestamp").cast("long").$minus(functions.lag("timestamp", 1).over(w).cast("long")), functions.lit(0));
我的问题是如何满足第二个条件并一起实现所有三个转换。
答案 0 :(得分:1)
我想说使用Spark SQL可以降低复杂性并轻松实现结果
df.createOrReplaceTempView("test")
spark.sql("select id,refId,timestamp,case when refId is null and id is not null then 'true' when id is not null and refId is not null and rank=1 then 'true' else 'false' end as session from (select id,refId,timestamp, rank() OVER (PARTITION BY id,refId ORDER BY timestamp DESC) as rank from test) c").show()
输出看起来像这样:
+---+--------+----------+-------+
| id| refId| timestamp|session|
+---+--------+----------+-------+
| 1| null|1548944642| true|
| 1|29950529|1548937685| true|
| 2|27510720|1548944885| true|
| 2|27510720|1548943617| false|
+---+--------+----------+-------+
答案 1 :(得分:1)
您可以使用窗口函数对id和rfId进行分组并按时间戳排序,然后添加一个等级列。最后,您在会话列中添加了when或sql函数。
public void startScan() { //use this when you want to resume the camera
if (scannerView != null) {
scannerView.setResultHandler(this);
scannerView.startCamera();
rescan();
}
}
public void stopScan() { //use this when you want to stop scanning
// it is very important to do that,
// because the camera will keep scanning codes in background
if (scannerView != null) {
scannerView.stopCameraPreview();
scannerView.stopCamera();
}
}
public void rescan() {
if (scannerView != null) {
scannerView.resumeCameraPreview(this);
}
}