我的df像id col1 col2
一样,我想将col1
和col2
求和成cols
。然后按ID显示id cols
顺序。我知道如何像df.select($"col1" + $"col2").orderBy(desc("id"))
那样进行求和,但是df.select($"col1" + $"col2")
会删除id
,所以我不能按id
进行orderBy。有想法吗?
答案 0 :(得分:1)
df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).select($"id",
$"cols")
withColumn返回一个新数据帧,其中包含列:“ cols”。然后对“ id”列执行orderBy并选择“ id”和“ cols”列。另外,您也可以在orderB后面放列。通过函数drop(columnNames *)函数
scala> val df = Seq((2, 10, 20), (1, 5, 30), (3, 25, 15)).toDS.select($"_1" as "id", $"_2" as "col1", $"_2" as "col2")
df: org.apache.spark.sql.DataFrame = [id: int, col1: int ... 1 more field]
scala> df.show
+---+----+----+
| id|col1|col2|
+---+----+----+
| 2| 10| 10|
| 1| 5| 5|
| 3| 25| 25|
+---+----+----+
scala> df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).select($"id", $"cols").show
+---+----+
| id|cols|
+---+----+
| 3| 50|
| 2| 20|
| 1| 10|
+---+----+
scala> df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).drop("col1", "col2").show
+---+----+
| id|cols|
+---+----+
| 3| 50|
| 2| 20|
| 1| 10|
+---+----+
scala> df.withColumn("cols", $"col1" + $"col2").orderBy(desc("id")).show
+---+----+----+----+
| id|col1|col2|cols|
+---+----+----+----+
| 3| 25| 25| 50|
| 2| 10| 10| 20|
| 1| 5| 5| 10|
+---+----+----+----+
答案 1 :(得分:1)
如下所示。
val df = Seq(("Edward",1, 1, 1000,"me1@example.com"),
("Michal",3,2,15000,"me1@example.com"),
("Steve",7,3,25000,"you@example.com"),
("Jordan",2,4,40000, "me1@example.com")).
toDF("Name", "ID1", "ID2","Salary","MailId")
df.show()
+------+---+---+------+---------------+
| Name|ID1|ID2|Salary| MailId|
+------+---+---+------+---------------+
|Edward| 1| 1| 1000|me1@example.com|
|Michal| 3| 2| 15000|me1@example.com|
| Steve| 7| 3| 25000|you@example.com|
|Jordan| 2| 4| 40000|me1@example.com|
+------+---+---+------+---------------+
val df1 = df.select($"Salary",($"ID"+$"ID2").as("ID")).orderBy(desc("Salary"))
df1.show()
+------+---+
|Salary| ID|
+------+---+
| 40000| 6|
| 25000| 10|
| 15000| 5|
| 1000| 2|
+------+---+