数据集如下所示。卡住了将HIRE_DATE
格式更改为日期格式字段
EMPLOYEE_ID,FIRST_NAME,LAST_NAME,EMAIL,PHONE_NUMBER,HIRE_DATE,JOB_ID,SALARY,COMMISSION_PCT,MANAGER_ID,DEPARTMENT_ID
100,Steven,King,SKING,515.123.4567,17-JUN-03,AD_PRES,24000, - , - ,90
101,Neena,Kochhar,NKOCHHAR,515.123.4568,21-SEP-05,AD_VP,17000, - ,100,90
和代码段
val empData = sparkSession.read.option("header", "true").option("inferSchema", "true").
csv(filePath)empData.printSchema()
printSchema
输出为HIRE_DATE
字段提供了字符串。但是我期望Dateformat
字段。我该如何更改?
答案 0 :(得分:2)
这是我的方法:
import java.text.SimpleDateFormat
val dateFormat = new SimpleDateFormat("dd-MMM-yy")
def convertStringToDate(StringDate:String) = {
val parsed = dateFormat.parse(StringDate)
new java.sql.Date(parsed.getTime())
}
val convertStringToDateUDF = udf(convertStringToDate _)
df.withColumn("HIRE_DATE",convertStringToDateUDF($"HIRE_DATE"))
答案 1 :(得分:1)
火花有其自己的日期类型。如果以格式字符串“ yyyy-MM-dd”提供日期值,则可以将其转换为Spark的Date类型。因此,您要做的就是将输入日期字符串转换为“ yyyy-MM-dd”格式 对于时间和日期格式,最好使用java.time库。 见下文
val df = spark.read.option("inferSchema",true).option("header", true).csv("in/emp2.txt")
def formatDate(x:String):String =
{
val y = x.toLowerCase.split('-').map(_.capitalize).mkString("-")
val z= java.time.LocalDate.parse(y,java.time.format.DateTimeFormatter.ofPattern("dd-MMM-yy"))
z.toString
}
val myudfDate = udf ( formatDate(_:String):String )
val df2 = df.withColumn("HIRE_DATE2", date_format(myudfDate('HIRE_DATE),"yyyy-MM-dd") )
df2.show(false)
+-----------+----------+---------+--------+------------+---------+-------+------+--------------+----------+-------------+----------+
|EMPLOYEE_ID|FIRST_NAME|LAST_NAME|EMAIL |PHONE_NUMBER|HIRE_DATE|JOB_ID |SALARY|COMMISSION_PCT|MANAGER_ID|DEPARTMENT_ID|HIRE_DATE2|
+-----------+----------+---------+--------+------------+---------+-------+------+--------------+----------+-------------+----------+
|100 |Steven |King |SKING |515.123.4567|17-JUN-03|AD_PRES|24000 | - | - |90 |2003-06-17|
|101 |Neena |Kochhar |NKOCHHAR|515.123.4568|21-SEP-05|AD_VP |17000 | - |100 |90 |2005-09-21|
+-----------+----------+---------+--------+------------+---------+-------+------+--------------+----------+-------------+----------+