我正在尝试将星期几的字符串名称(例如TUESDAY)转换为整数(例如3)。我在下面写了一张地图,但我不确定如何应用它。
import org.apache.spark.sql.functions.lower
val dayNameToInteger = Map(
"sunday" -> 1,
"monday" -> 2,
"tuesday" -> 3,
"wednesday" -> 4,
"thursday" -> 5,
"friday" -> 6,
"saturday" -> 7)
var input = sqlContext.createDataFrame(Seq(
(0L, "SUNDAY", 34),
(1L, "Monday", 31),
(2L, "tuesday", 25)
)).toDF("id", "day_of_week", "value")
scala> input.show
+---+-----------+-----+
| id|day_of_week|value|
+---+-----------+-----+
| 0| SUNDAY| 34|
| 1| Monday| 31|
| 2| tuesday| 25|
+---+-----------+-----+
var output = input.select($"id", dayNameToInteger(lower(input("day_of_week"))))
<console>:27: error: type mismatch;
found : org.apache.spark.sql.Column
required: String
var output = input.select($"id", dayNameToInteger(lower(input("day_of_week"))))
答案 0 :(得分:0)
转换应该通过UDF完成,因为它可以对所有行上的列进行操作。
val dayToInt = udf((dayOfWeek:String) => {
dayOfWeek match {
case "sunday" => 1
case "monday" => 2
case "tuesday" => 3
case "wednesday" => 4
case "thursday" => 5
case "friday" => 6
case "saturday" => 7
}
})
var output = input.select($"id", dayToInt(lower(input("day_of_week"))).as("day_int"))
scala> output.show
+---+-------+
| id|day_int|
+---+-------+
| 0| 1|
| 1| 2|
| 2| 3|
+---+-------+
答案 1 :(得分:0)
作为上述答案的改进,这里是我的Spark 2.x兼容的udf,将星期字符串索引为整数:
byte[] getImage() {
ByteArrayOutputStream baos = new ByteArrayOutputStream()
ImageIO.write(ImageIO.read(new File('/path/to/file')), "jpg", baos)
baos.toByteArray()
}
还用于机器学习模型:
<img ... src="${createLink(controller: 'imageController', action: 'getImage')}" />