两个人如何使用spark.sql根据每一行的列选择两个表

时间:2019-05-08 16:34:55

标签: sql apache-spark dataframe

基本上,我有两个表,如下所示:

root
 |-- machine_id: string (nullable = true)
 |-- time_stamp: double (nullable = true)

scala> containerUsage.printSchema
root
 |-- container_id: string (nullable = true)
 |-- machine_id: string (nullable = true)
 |-- time_stamp: double (nullable = true)
 |-- cpu_util_percent: double (nullable = true)
 |-- mem_util_percent: double (nullable = true)
 |-- cpi: double (nullable = true)
 |-- mem_gps: double (nullable = true)
 |-- mpki: integer (nullable = true)
 |-- net_in: double (nullable = true)
 |-- net_out: double (nullable = true)
 |-- disk_io_percent: double (nullable = true)

我想从containerUsage中选择列以获取其{machine_id,time_stamp}可以在invalidTime中找到的那些行。

尝试

WHERE containerUsage.machine_id = invalidTime.machine_id AND containerUsage.time_stamp = invalidTime.time_stamp

这会选择可以在invalidTime中找到其time_stamp或在machine_id中找到的行。

我想获取可以在invalidTime中找到{machine_id,time_stamp}组合(仅用于描述而不是形成数组等)的行

0 个答案:

没有答案