我将DF下面的火花1.6保存到凤凰表中,我面临的问题是"使用Column(" create_ts",current_timestamp())"为整个DF插入相同的时间戳。请看下面的例子。
我希望每个作业的每一行都有以毫秒为单位的唯一时间戳,因为这个问题导致大量数据由于相同的复合键而被覆盖。
示例数据:
stagedDataFrame
.select($"RemoteID", $"TagName", $"TagValueTs", $"Value", $"TagTypeName")
.withColumn("job_name", lit(s"${etlStatistics2.sqlContext.sparkContext.appName}_${etlStatistics2.sqlContext.sparkContext.applicationId}"))
.withColumn("create_ts", current_timestamp())
.withColumn("record_count", lit(etlStatistics2.head().getLong(3)))
.select($"job_name", $"create_ts",$"record_count",$"RemoteID" as "remoteid", $"TagName" as "tagname", $"TagValueTs" as "tagvalue_ts", $"Value" as "value", $"TagTypeName" as "tagtypename")
代码:
socket