我有多个表,需要连接多个公共属性,这样不同的属性可以显示在一个表中。
表1
+--------+---------+--------+
| make | model | kms |
+--------+---------+--------+
| toyota | corolla | 25000 |
| toyota | camry | 50000 |
+--------+---------+--------+
表2
+--------+---------+---------+
| make | model | mileage |
+--------+---------+---------+
| toyota | corolla | 20 |
| toyota | qualis | 25 |
+--------+---------+---------+
表4
+--------+----------+-------+
| make | model | colr |
+--------+----------+-------+
| toyota | camry | blue |
| toyota | rav4 | green |
+--------+----------+-------+
表5
select a.make, a.model,a.r_yr,b.kms,c.mileage,d.colr
from table1 as a
left join table2 as b
on b.make=a.make and b.model=a.model and b.r_yr=a.r_yr
left join table3 as c
on c.make=a.make and c.model=a.model and c.r_yr=a.r_yr
left join table4 as d
on d.make=a.make and d.model=a.model and d.r_yr=a.r_yr
我正在执行以下操作来加入结果
+--------+---------+-------+-------+----------+--------+
| make | model | r_yr | kms | mileage | colr |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla | 1999 | 25000 | 20 | |
| toyota | camry | 2002 | 50000 | | blue |
| toyota | qualis | 2004 | | 25 | |
| toyota | rav4 | 2006 | | | green |
+--------+---------+-------+-------+----------+--------+
这给出了如下表格
common cols
然而,我遇到的问题是,对于我正在使用的实际数据集,每个表有5 unique attributes
,每个表大约需要指定20 b.kms, ....,c.mileage, ......,d.colr,....
common cols
查询中的-40 col名称为BufferedImage image = null; //use your image here
//Convert image to OpenCv Mat
byte[] pixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
Mat mat = new Mat(image.getHeight(), image.getWidth(), CvType.CV_8UC3);
mat.put(0, 0, pixels);
//do something with the Mat e.g:
Imgproc.threshold(...);
//Convert back
mat.get(0, 0, pixels);
。通过指定除case class Linkedin_Profile(experience : Array[Experience])
case class Experience(company : String)
val rdd = MongoSpark.load(sc, ReadConfig(Map("uri" -> mongo_uri_linkedin)))
val company_DS = rdd.toDS[Linkedin_Profile]()
val count_udf = udf((x: scala.collection.mutable.WrappedArray[String]) => {x.filter( _ != null).groupBy(identity).mapValues(_.size)})
val company_ColCount = company_DS.select(explode(count_udf($"experience.company")))
comp_rdd.saveAsTextFile("/dbfs/FileStore/chandan/intermediate_count_results.csv")
之外的所有方式或其他方式,是否有必要解决这些唯一列的问题?
答案 0 :(得分:4)
您无法执行SELECT all except x,y,z ...
之类的操作
但您可以使用USING
子句而不是JOIN ... ON
演示:http://sqlfiddle.com/#!17/fa97a/6
select *
from table1 as a
left join table2 as b
USING (make, model)
left join table3 as c
USING (make, model)
left join table4 as d
USING (make, model)
| make | model | r_yr | kms | mileage | colr |
|--------|---------|------|--------|---------|--------|
| toyota | camry | 2002 | 50000 | (null) | blue |
| toyota | corolla | 1999 | 25000 | 20 | (null) |
| toyota | qualis | 2004 | (null) | 25 | (null) |
| toyota | rav4 | 2006 | (null) | (null) | green |
注意:在上面的示例中,我只使用了两个常见列(make, model)
,因为在您的示例中r_yr
不是公共列,因为它只在table1中