我有这样的RDD:
+----------+--------+
|firstName |lastName|
+----------+--------+
| john| smith|
| anna| tourde|
+----------+--------+
我想像zipWithIndex一样创建一个新列但是给出初始值为8。
+----------+--------+-----+
|firstName |lastName|index|
+----------+--------+-----+
| john| smith| 8|
| anna| tourde| 9|
+----------+--------+-----+
你知道吗?感谢
答案 0 :(得分:4)
rdd.zipWithIndex().map { case (v, ind) =>
(v, ind + 8)
}
答案 1 :(得分:2)
使用zipWithIndex
并转换回数据框,如下所示
val df1 = spark.sqlContext.createDataFrame(
df.rdd.zipWithIndex.map {
case (row, index) => Row.fromSeq(row.toSeq :+ index + 8)
},
// Create schema for index column
StructType(df.schema.fields :+ StructField("index", LongType, false)))