我正在尝试将包含以下内容的csv文件解析为dataframe:
+------+---------+----------+
|Symbol| Date|ClosePrice|
+------+---------+----------+
| SREEL| 1-Jan-14| 298.0|
| SREEL| 2-Jan-14| 299.9|
+------+---------+----------+
但我无法根据代码段将日期字段转换为时间戳字段。它给了我不正确的结果。
任何人都可以帮助我理解相同的原因吗?
val sparkConf = new SparkConf().setAppName("TimeSeriesForecast").setMaster("local")
sparkConf.set("spark.sql.shuffle.partitions", "4")
val sparkContext = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sparkContext)
val stockDF: DataFrame = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("data/Sreeleathers_Share_Price.csv")
val priceDF: DataFrame = stockDF.select(stockDF("Symbol").as("Symbol"),
stockDF("Date").as("Date"),
stockDF("Close Price").as("ClosePrice"))
//priceDF.printSchema
//priceDF.show
import sqlContext.implicits._
val finalDf: DataFrame = priceDF
.withColumn("Price", priceDF("ClosePrice").cast(DoubleType))
.withColumn("TimeStamp", unix_timestamp($"Date","d-MMM-yy").cast(TimestampType))
.drop("Date").drop("ClosePrice")
.sort("TimeStamp")
答案 0 :(得分:0)
我尝试跟随火花1.6,它似乎正在工作。发布答案因为评论时间太长
val myDF = Seq(("1-Jan-14", 2, 1L), ("2-Jan-14", 1, 2L)).toDF("Date", "col2", "col3")
myDF.show()
+--------+----+----+
| Date|col2|col3|
+--------+----+----+
|1-Jan-14| 2| 1|
|2-Jan-14| 1| 2|
+--------+----+----+
myDF.withColumn("TimeStamp", unix_timestamp($"Date","d-MMM-yy").cast(TimestampType)).show()
+--------+----+----+--------------------+
| Date|col2|col3| TimeStamp|
+--------+----+----+--------------------+
|1-Jan-14| 2| 1|2014-01-01 00:00:...|
|2-Jan-14| 1| 2|2014-01-02 00:00:...|
+--------+----+----+--------------------+