我有一个这样的代码段:
case class Purchase(cid: Int, pid: String, num: String)
val x = sc.parallelize(Array(
Purchase(123, "234", "1"),
Purchase(123, "247", "2"),
Purchase(189, "254", "3"),
Purchase(187, "299", "4")
))
// I have a dataframe structure: [cid: int, pid: string, num: string]
val df = sqlContext.createDataFrame(x)
// Defining a column name which I need to transform. Its value can change, like pid
val colName = "num"
// Defining a UDF. The definition of the UDF can change
val toIntUdf = udf((myString: String) => myString.toInt )
// This works
df.select( toIntUdf($"num") ).collect
我正在寻找避免使用" num"的方法。有什么想法吗?
答案 0 :(得分:5)
如果您的意思是想要使用colName
而不是使用文字$"num"
,请按以下方式进行操作:
import org.apache.spark.sql.functions._
df.select(toIntUdf(col(colName))).collect
答案 1 :(得分:1)
您可以通过这种方式选择专栏。您可以在Spark's DataFrame
中找到更多文档df.select(toIntUdf(df(colName)))
或者:
df.select(toIntUdf(df.col(colName)))