zipWithIndex rdd具有初始值

时间:2017-08-04 08:49:45

标签: scala apache-spark rdd

我有这样的RDD:

+----------+--------+
|firstName |lastName|
+----------+--------+
|      john|   smith|
|      anna|  tourde|
+----------+--------+

我想像zipWithIndex一样创建一个新列但是给出初始值为8。

+----------+--------+-----+
|firstName |lastName|index|
+----------+--------+-----+
|      john|   smith|    8|
|      anna|  tourde|    9|
+----------+--------+-----+
你知道吗?感谢

2 个答案:

答案 0 :(得分:4)

rdd.zipWithIndex().map { case (v, ind) =>
  (v, ind + 8)
}

答案 1 :(得分:2)

使用zipWithIndex并转换回数据框,如下所示

val df1 = spark.sqlContext.createDataFrame(
    df.rdd.zipWithIndex.map {
  case (row, index) => Row.fromSeq(row.toSeq :+ index + 8)
},
// Create schema for index column
StructType(df.schema.fields :+ StructField("index", LongType, false)))