如何在scala中将两列加在一起并按id排序?

时间:2019-03-11 23:37:39

标签: scala apache-spark

我的df像id col1 col2一样,我想将col1col2求和成cols。然后按ID显示id cols顺序。我知道如何像df.select($"col1" + $"col2").orderBy(desc("id"))那样进行求和,但是df.select($"col1" + $"col2")会删除id,所以我不能按id进行orderBy。有想法吗?

2 个答案:

答案 0 :(得分:1)

df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).select($"id", 
$"cols")

withColumn返回一个新数据帧,其中包含列:“ cols”。然后对“ id”列执行orderBy并选择“ id”和“ cols”列。另外,您也可以在orderB后面放列。通过函数drop(columnNames *)函数

scala> val df = Seq((2, 10, 20), (1, 5, 30), (3, 25, 15)).toDS.select($"_1" as "id", $"_2" as "col1", $"_2" as "col2")
df: org.apache.spark.sql.DataFrame = [id: int, col1: int ... 1 more field]

scala> df.show
+---+----+----+
| id|col1|col2|
+---+----+----+
|  2|  10|  10|
|  1|   5|   5|
|  3|  25|  25|
+---+----+----+


scala> df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).select($"id", $"cols").show
+---+----+
| id|cols|
+---+----+
|  3|  50|
|  2|  20|
|  1|  10|
+---+----+


scala> df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).drop("col1", "col2").show
+---+----+
| id|cols|
+---+----+
|  3|  50|
|  2|  20|
|  1|  10|
+---+----+


scala> df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).show
+---+----+----+----+
| id|col1|col2|cols|
+---+----+----+----+
|  3|  25|  25|  50|
|  2|  10|  10|  20|
|  1|   5|   5|  10|
+---+----+----+----+

答案 1 :(得分:1)

如下所示。

val df = Seq(("Edward",1, 1, 1000,"me1@example.com"),
      ("Michal",3,2,15000,"me1@example.com"),
      ("Steve",7,3,25000,"you@example.com"),
      ("Jordan",2,4,40000, "me1@example.com")).
      toDF("Name", "ID1", "ID2","Salary","MailId")

    df.show()

+------+---+---+------+---------------+
|  Name|ID1|ID2|Salary|         MailId|
+------+---+---+------+---------------+
|Edward|  1|  1|  1000|me1@example.com|
|Michal|  3|  2| 15000|me1@example.com|
| Steve|  7|  3| 25000|you@example.com|
|Jordan|  2|  4| 40000|me1@example.com|
+------+---+---+------+---------------+

    val df1 = df.select($"Salary",($"ID"+$"ID2").as("ID")).orderBy(desc("Salary"))

df1.show()

+------+---+
|Salary| ID|
+------+---+
| 40000|  6|
| 25000| 10|
| 15000|  5|
|  1000|  2|
+------+---+