我有一个包含Value ID和Value
的表| id | value |
-----------------
| 1 | UnKnown |
| 1 | A |
| 2 | UnKnown |
| 2 | UnKnown |
| 3 | B |
| 3 | B |
| 3 | B |
我需要从表中选择不同的id和相应的值。选择Id时应该是唯一的,如果它在值字段中有多个值,则它应该只检索未知值
所以结果应该如下所示。
| id | value |
-----------------
| 1 | A |
| 2 | UnKnown |
| 3 | B |
我如何通过像条件这样的条件实现分组是未知的'然后保持未知'否则是SQL或Spark Scala中的值?
答案 0 :(得分:0)
以下是使用Scala和Spark 2.0.0 SQL的示例。你可以在spark-shell上试试这个。
val v = Seq((1,"Unknown"),(1,"A"),(2,"Unknown"),(2,"Unknown"),(3,"B"),(3,"B"),(3,"B")).toDF("id","value")
v.show
v.createOrReplaceTempView("v1")
spark.sql("select * from v1 where value!='Unknown' union (select * from v1 a where (select count (*) from v1 b where a.id=b.id and b.value!='Unknown')<1)").show