我正在从CSV文件中读取DataFrame,其中第一列是事件日期和时间,例如
2016-08-08 07:45:28 + 03
在下面的代码中,是否可以在架构定义中指定如何将此类字符串转换为日期?
val df:DataFrame = spark.read.options(Map(
"header" -> "true"
)).schema(StructType(
StructField("EventTime", DataTypes.DateType, false) ::
Nil
)).csv("C:/qos1h.csv")
此代码以
失败java.lang.NumberFormatException: For input string: "28+03"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at java.sql.Timestamp.valueOf(Timestamp.java:259)
at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:135)
at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:291)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:115)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:84)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$1.apply(CSVFileFormat.scala:125)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$1.apply(CSVFileFormat.scala:124)
答案 0 :(得分:1)
看起来在模式定义中无法指定字符串到日期的转换。但是在DataFrameReader.csv方法documentation中,有关于如何通过每个DateType字段的选项指定日期格式字符串的信息。
这是固定代码:
val df:DataFrame = spark.read.options(Map(
"header" -> "true",
"dateFormat" -> "yyyy-MM-dd HH:mm:ssX"
)).schema(StructType(
StructField("EventTime", DataTypes.DateType, false) ::
Nil
)).csv("C:/qos1h.csv")