如何在Apache Spark Java或Scala中实现此目标?

时间:2018-08-01 02:14:45

标签: apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

在旅途开始时,汽车上的设备不会发送TRIP ID,而在TRIP结束时将发送TRIP ID。如何将相应的TRIP IDS应用于相应的记录

09:30,25,DEVICE_1
10:30,55,DEVICE_1
10:25,0,DEVICE_1,TRIP_ID_0
11:30,45,DEVICE_1
10:30,55,DEVICE_2
10:30,55,DEVICE_3
11:30,45,DEVICE_3
12:30,0,DEVICE_3,TRIP_ID_3
10:30,55,DEVICE_4
11:30,45,DEVICE_4
11:30,45,DEVICE_2
12:30,0,DEVICE_2,TRIP_ID_2
12:30,0,DEVICE_4,TRIP_ID_4
10:30,55,DEVICE_5
11:30,45,DEVICE_5
12:30,0,DEVICE_5,TRIP_ID_5
12:30,0,DEVICE_1,TRIP_ID_1

所以上面变成了这样,

09:30,25,DEVICE_1,TRIP_ID_0
10:25,0,DEVICE_1,TRIP_ID_0
10:30,55,DEVICE_1,TRIP_ID_1
11:30,45,DEVICE_1,TRIP_ID_1
12:30,0,DEVICE_1,TRIP_ID_1
10:30,55,DEVICE_2,TRIP_ID_2
11:30,45,DEVICE_2,TRIP_ID_2
12:30,0,DEVICE_2,TRIP_ID_2
10:30,55,DEVICE_3,TRIP_ID_3
11:30,45,DEVICE_3,TRIP_ID_3
12:30,0,DEVICE_3,TRIP_ID_3
10:30,55,DEVICE_4,TRIP_ID_4
11:30,45,DEVICE_4,TRIP_ID_4
12:30,0,DEVICE_4,TRIP_ID_4
10:30,55,DEVICE_5,TRIP_ID_5
11:30,45,DEVICE_5,TRIP_ID_5
12:30,0,DEVICE_5,TRIP_ID_5

1 个答案:

答案 0 :(得分:2)

一个有趣的问题。必须修复一个错误!

当我在ORACLE中尝试此操作时,您将需要转换为spark.sql。但是spark.sql支持WITH子句。另外,由于事实已经很晚了,所以我不使用日期字符串,而是使用数字来表示时间,所以您需要注意一下。

但这是您可以适应的SQL。

with X as (select device, time_asc, trip_id from trips where trip_id is not null)
select Y.TRIP_ID, Y.DEVICE, Y.TIME_ASC FROM (
select T1.TIME_ASC, T1.DEVICE, X.TRIP_ID, X.TIME_ASC AS TIME_ASC_COMPARE
      ,RANK() OVER (PARTITION BY T1.TIME_ASC, T1.DEVICE ORDER BY X.TIME_ASC) AS RANK_VAL       from trips T1, X
 where T1.DEVICE = X.DEVICE
   and T1.TIME_ASC <= X.TIME_ASC) Y
 where RANK_VAL = 1
 order by TRIP_ID, TIME_ASC

摆脱顺序,只是用来显示。

此数据作为输入:

 ('1','A',null);
 ('2','A','TRIP_01');
 ('5','A',null);
 ('6','A',null);
 ('7','A',null);
 ('23','A','TRIP_02');
 ('56','A',null);
 ('60','A','TRIP_04');
 ('8','B',null);
 ('10','B','TRIP_03');
 ('1','E',null);
 ('2','E','TRIP_05');

在我导出并获得此格式时删除引号,返回以下内容,我认为这将满足您的需要-再次请问格式化:

 ('TRIP_01','A','1');
 ('TRIP_01','A','2');
 ('TRIP_02','A','5');
 ('TRIP_02','A','6');
 ('TRIP_02','A','7');
 ('TRIP_02','A','23');
 ('TRIP_03','B','8');
 ('TRIP_03','B','10');
 ('TRIP_04','A','56');
 ('TRIP_04','A','60');
 ('TRIP_05','E','1');
 ('TRIP_05','E','2');

我想知道SPARK在引擎盖下的表现如何。这在深夜花费了一些精力,因此需要一些赞赏。也很愉快。