我有一个带有数组列的表,如:
+-Name-+
array
0: {"given_name":"B. A.", "surname":"Name1"}
1: {"given_name":"A.", "surname":"Name2"}
2: {"given_name":"C." "surname":"Name3"}
我想在数组中添加一个以1开头的元素项“索引”,以查找作者的序列,例如
+-Name-+
array
0: {"given_name":"B. A.", "surname":"Name1", "index":"1"}
1: {"given_name":"A.", "surname":"Name2", "index":"2"}
2: {"given_name":"C." "surname":"Name3", "index":"3"}
如何在Scala中执行此操作,非常感谢您的帮助。
答案 0 :(得分:2)
这是使用UDF的一种方法,该方法将数组类型列的每个元素映射为还包括元素索引:
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions._
import spark.implicits._
case class Name(given_name: String, surname: String)
case class NameIdx(given_name: String, surname: String, index: Int)
val df = Seq(
Seq(Name("John", "Doe"), Name("Jane", "Smith"), Name("Mike", "Davis")),
Seq(Name("Rachel", "Smith"), Name("Steve", "Thompson"))
).toDF("name")
val addIndex = udf((names: Seq[Row]) => names.map{
case name @ Row(gn: String, sn: String) => NameIdx(gn, sn, names.indexOf(name) + 1)
})
df.select(addIndex($"name").as("name")).show(false)
// +----------------------------------------------+
// |name |
// +----------------------------------------------+
// |[[John,Doe,1], [Jane,Smith,2], [Mike,Davis,3]]|
// |[[Rachel,Smith,1], [Steve,Thompson,2]] |
// +----------------------------------------------+
要生成JSON值,请按如下所示应用to_json
:
df.select(to_json(addIndex($"name")).as("name")).show(false)
// +-----------------------------------------------------------------------------------------------------------------------------------------------------+
// |name |
// +-----------------------------------------------------------------------------------------------------------------------------------------------------+
// |[{"given_name":"John","surname":"Doe","index":1},{"given_name":"Jane","surname":"Smith","index":2},{"given_name":"Mike","surname":"Davis","index":3}]|
// |[{"given_name":"Rachel","surname":"Smith","index":1},{"given_name":"Steve","surname":"Thompson","index":2}] |
// +-----------------------------------------------------------------------------------------------------------------------------------------------------+