Question

Scala 2.11在这里。我有以下input数据库表：

[input]
===
id BIGINT UNSIGNED NOT NULL,
name VARCHAR(50) NOT NULL,
rank INT NOT NULL

我将一些input条记录读入Spark DataFrame，如下所示：

val inputDf = sqlContext().read
    .format("blah whatever")
    .option("url", "jdbc://blah://whatever")
    .option("query", "SELECT * FROM input WHERE id < 500")
    .load()

到目前为止一切顺利。我现在想要遍历inputDf中的每一行，并将转换应用于rank字段：

rank = rank * 50

因此，如果从DB中读入以下3条input条记录：

id | name | rank
================
1  | Fizz | 3
2  | Buzz | 14
3  | Foo  | 294

然后结果DataFrame需要看起来像：

id | name | rank
================
1  | Fizz | 150
2  | Buzz | 700
3  | Foo  | 14700

我相信我可以使用map函数，例如：

inputDf.map(input =>
  // I believe this gets me the value of the 3rd column (rank):
  input.getInt(3).intValue()

  // Now how to update/set rank as 'rank *= 50' ?
  ???
).collect()

但是我很难通过树木看到森林＆＃34;森林。 有任何想法吗？结果应为inputDf其rank列/字段已正确更新/转换的结果。

Answer 1

只需使用withColumn：

inputDf.withColumn("rank",  inputDf("rank") * 50)

或select：

inputDf.withColumn($"*",  ($"rank" * 50).alias("rank"))

Spark map函数用于执行列更新

1 个答案: