我有一些数据包含在如下的字符串数组中(仅用于示例):
val myArray = Array("1499955986039", "1499955986051", "1499955986122")
我想将我的列表映射到Timestamp数组,以便创建RDD(myRdd),然后创建一个这样的数据框
val df = createdataframe(myRdd, StructType(StructField("myTymeStamp", TimestampType,true)
我的问题不是如何创建Rdd,而是如何用毫秒时间戳替换字符串。你有什么主意吗?感谢
答案 0 :(得分:8)
val myArray = Array("1499955986039", "1499955986051", "1499955986122")
import java.sql.Timestamp
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, TimestampType}
val rdd = sc.parallelize(myArray).map(s => Row(new Timestamp(s.toLong)))
val schema = StructType(Array(StructField("myTymeStamp", TimestampType, true)))
spark.createDataFrame(rdd, schema)
// res25: org.apache.spark.sql.DataFrame = [myTymeStamp: timestamp]
答案 1 :(得分:3)
之前你不需要转换为时间戳, 您只需转换为long,就可以在创建数据框时使用模式转换为tymestamp,如下所示
import org.apache.spark.sql.Row
val myArray = Array("1499955986039", "1499955986051", "1499955986122")
val myrdd = spark.sparkContext.parallelize(myArray.map(a => Row(a.toLong)))
val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", TimestampType,true))))
否则,你可以从String创建一个数据帧,然后转换为时间戳,如下所示
val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", StringType,true))))
//cast myTymeStamp from String to Long and to timestamp
df.withColumn("myTymeStamp", $"myTymeStamp".cast(LongType).cast(TimestampType))
希望这有帮助!