在Java中使用Spark流对两个数据帧进行非规范化

时间:2019-03-29 05:04:33

标签: java apache-spark apache-spark-sql spark-streaming

使用Java中的Spark流,我试图将两个数据帧非规范化为单个扁平化数据帧。 做法数据框可以包含重复记录(对于主键 practice_id ),因此我想加入以过滤掉较旧的记录(基于 updated_ts >列)。

做法数据框:

+--------------------+----------------+-----------+------------------+
|          updated_ts|   practice_name|practice_id|primary_address_id|
+--------------------+----------------+-----------+------------------+
|2019-03-23T17:08:42Z|Fal Vet Shop    |          1|                 1|
|2019-03-29T03:06:42Z|Fal Vet Shop AAA|          1|                 1|
|2019-03-27T01:45:26Z|Test Shop       |          2|                 2|
+--------------------+----------------+-----------+------------------+

地址数据框:

+--------------------+------------+------------+--------+------------+----------+----+----------+-----------+
|          updated_ts|country_code|address_type|    city|    address1|address_id|state_code|postal_code|
+--------------------+------------+------------+--------+------------+----------+----+----------+-----------+
|2019-01-20T20:10:39Z|          US|        HOME|Falmouth|5 Country Ln|         1|        ME|      04105|
|2019-01-20T15:09:09Z|          US|         BIZ|Falmouth|13 Main St. |         2|        ME|      04105|
+--------------------+------------+------------+--------+------------+----------+----+----------+-----------+


如何获取下面的记录?在加入两个数据框之前,我尝试使用dfPractices.dropDuplicates("practice_id"),但它保留了较旧的练习记录(practice_name = Fal Vet Shop)与下面要指出的记录(practice_name = Fal Vet Shop AAA)。

+--------------------+----------------+-----------+------------------+--------------------+------------+------------+--------+------------+----------+----+----------+-----------+
|          updated_ts|   practice_name|practice_id|primary_address_id|          updated_ts|country_code|address_type|    city|    address1|address_id|state_code|postal_code|
+--------------------+----------------+-----------+------------------+--------------------+------------+------------+--------+------------+----------+----+----------+-----------+
|2019-03-29T03:06:42Z|Fal Vet Shop AAA|          1|                 1|2019-01-20T20:10:39Z|          US|        HOME|Falmouth|5 Country Ln|         1|        ME|      04105|
|2019-03-27T01:45:26Z|Test Shop       |          2|                 2|2019-01-20T15:09:09Z|          US|         BIZ|Falmouth|13 Main St. |         2|        ME|      04105|
+--------------------+----------------+-----------+------------------+--------------------+------------+------------+--------+------------+----------+----+----------+-----------+

0 个答案:

没有答案