Question

我有多个表，需要连接多个公共属性，这样不同的属性可以显示在一个表中。

表1

+--------+---------+--------+
|  make  |  model  |  kms   |
+--------+---------+--------+
| toyota | corolla |  25000 |
| toyota |  camry  |  50000 |
+--------+---------+--------+

表2

+--------+---------+---------+
|  make  |  model  | mileage |
+--------+---------+---------+
| toyota | corolla |      20 |
| toyota |  qualis |      25 |
+--------+---------+---------+

表4

+--------+----------+-------+
|  make  |  model   | colr  |
+--------+----------+-------+
| toyota |  camry   | blue  |
| toyota |  rav4    | green |
+--------+----------+-------+

表5

select a.make, a.model,a.r_yr,b.kms,c.mileage,d.colr
    from table1 as a
    left join table2 as b
    on b.make=a.make and b.model=a.model and b.r_yr=a.r_yr
    left join table3 as c
    on c.make=a.make and c.model=a.model and c.r_yr=a.r_yr
    left join table4 as d
    on d.make=a.make and d.model=a.model and d.r_yr=a.r_yr

我正在执行以下操作来加入结果

+--------+---------+-------+-------+----------+--------+
|  make  |  model  | r_yr  |  kms  |  mileage |  colr  |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla |  1999 | 25000 |       20 |        |
| toyota |  camry  |  2002 | 50000 |          |  blue  |
| toyota |  qualis |  2004 |       |       25 |        |
| toyota |  rav4   |  2006 |       |          | green  |
+--------+---------+-------+-------+----------+--------+

这给出了如下表格

common cols

然而，我遇到的问题是，对于我正在使用的实际数据集，每个表有5 unique attributes，每个表大约需要指定20 b.kms, ....,c.mileage, ......,d.colr,.... common cols查询中的-40 col名称为BufferedImage image = null; //use your image here //Convert image to OpenCv Mat byte[] pixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData(); Mat mat = new Mat(image.getHeight(), image.getWidth(), CvType.CV_8UC3); mat.put(0, 0, pixels); //do something with the Mat e.g: Imgproc.threshold(...); //Convert back mat.get(0, 0, pixels);。通过指定除case class Linkedin_Profile(experience : Array[Experience]) case class Experience(company : String) val rdd = MongoSpark.load(sc, ReadConfig(Map("uri" -> mongo_uri_linkedin))) val company_DS = rdd.toDS[Linkedin_Profile]() val count_udf = udf((x: scala.collection.mutable.WrappedArray[String]) => {x.filter( _ != null).groupBy(identity).mapValues(_.size)}) val company_ColCount = company_DS.select(explode(count_udf($"experience.company"))) comp_rdd.saveAsTextFile("/dbfs/FileStore/chandan/intermediate_count_results.csv")之外的所有方式或其他方式，是否有必要解决这些唯一列的问题？

Answer 1

您无法执行SELECT all except x,y,z ...之类的操作
但您可以使用USING子句而不是JOIN ... ON

来简化此查询

演示：http://sqlfiddle.com/#!17/fa97a/6

select *
from table1 as a
left join table2 as b
USING (make, model)
left join table3 as c
USING (make, model)
left join table4 as d
USING (make, model)

|   make |   model | r_yr |    kms | mileage |   colr |
|--------|---------|------|--------|---------|--------|
| toyota |   camry | 2002 |  50000 |  (null) |   blue |
| toyota | corolla | 1999 |  25000 |      20 | (null) |
| toyota |  qualis | 2004 | (null) |      25 | (null) |
| toyota |    rav4 | 2006 | (null) |  (null) |  green |

注意：在上面的示例中，我只使用了两个常见列(make, model)，因为在您的示例中r_yr不是公共列，因为它只在table1中

加入宽桌（10个独特的cols）

1 个答案: