我想选择几列,添加几列或除以某些列,并用空格填充这些列,并以新名称存储它们作为别名。例如,SQL中的内容应类似于:
select " " as col1, b as b1, c+d as e from table
如何在Spark中实现这一目标?
答案 0 :(得分:4)
您也可以使用本机DF功能。例如:
import org.apache.spark.sql.functions._
val df1 = Seq(
("A",1,5,3),
("B",3,4,2),
("C",4,6,3),
("D",5,9,1)).toDF("a","b","c","d")
将列选择为:
df1.select(lit(" ").as("col1"),
col("b").as("b1"),
(col("c") + col("d")).as("e"))
为您提供了预期的结果:
+----+---+---+
|col1| b1| e|
+----+---+---+
| | 1| 8|
| | 3| 6|
| | 4| 9|
| | 5| 10|
+----+---+---+
答案 1 :(得分:3)
使用Spark-SQL,您可以执行相同的操作。
import org.apache.spark.sql.functions._
val df1 = Seq(
("A",1,5,3),
("B",3,4,2),
("C",4,6,3),
("D",5,9,1)).toDF("a","b","c","d")
df1.createOrReplaceTempView("table")
df1.show()
val df2 = spark.sql("select ' ' as col1, b as b1, c+d as e from table ").show()
输入:
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
| A| 1| 5| 3|
| B| 3| 4| 2|
| C| 4| 6| 3|
| D| 5| 9| 1|
+---+---+---+---+
输出:
+----+---+---+
|col1| b1| e|
+----+---+---+
| | 1| 8|
| | 3| 6|
| | 4| 9|
| | 5| 10|
+----+---+---+