Question

我的DF包含startTimeUnix列（Mongo中的Number类型），其中包含纪元时间戳。我想在此列上查询DF，但我想通过EST datetime。我经历了多次箍以测试火花壳上的以下内容：

val df = Seq(("1", "1523937600000"), ("2", "1523941200000"),("3","1524024000000")).toDF("id", "unix")

df.filter($"unix" > java.time.ZonedDateTime.parse("04/17/2018 01:00:00", java.time.format.DateTimeFormatter.ofPattern ("MM/dd/yyyy HH:mm:ss").withZone ( java.time.ZoneId.of("America/New_York"))).toEpochSecond()*1000).collect()

输出：

= Array([3,1524024000000])

由于java.time函数正在运行，我将它传递给spark-submit，当从Mongo检索数据时，过滤器查询如下：

startTimeUnix＆lt; （java.time.ZonedDateTime.parse（$ {LT}，java.time.format.DateTimeFormatter.ofPattern（＆＃39; MM / dd / yyyyHHmmss＆＃39;） .withZone（java.time.ZoneId.of（＆＃39; America / New_York＆＃39;）））。toEpochSecond（）* 1000） AND startTimeUnix＆gt; （java.time.ZonedDateTime.parse （$ {GT}，java.time.format.DateTimeFormatter.ofPattern（＆＃39; MM / dd / yyyyHHmmss＆＃39;）。withZone（java.time.ZoneId.of（＆＃39; America / New_York＆＃39; ）））。toEpochSecond（）* 1000）`

但是，我一直收到以下错误：

引起：org.apache.spark.sql.catalyst.parser.ParseException：在输入＆＃39;（java.time.ZonedDateTime.parse（04/18/201000000，java.time.format.DateTimeFormatter.ofPattern（＆＃39; MM / dd / yyyyHHmmss＆＃39;）。）withZone（＆＃39;（第1行，第138行） == SQL == startTimeUnix＆lt; （java.time.ZonedDateTime.parse（04/18/2017000000，java.time.format.DateTimeFormatter.ofPattern（＆＃39; MM / dd / yyyyHHmmss＆＃39;）。withZone（java.time.ZoneId.of（＆＃39; America / New_York＆＃39;）））。toEpochSecond（）* 1000）。toString（）AND startTimeUnix＆gt; （java.time.ZonedDateTime.parse（04/17/2017000000，java.time.format.DateTimeFormatter.ofPattern（＆＃39; MM / dd / yyyyHHmmss＆＃39;）。withZone（java.time.ZoneId.of（＆＃39;美国/纽约＆＃39;）））toEpochSecond（）* 1000）的ToString（）。在org.apache.spark.sql.catalyst.parser.ParseException.withCommand（ParseDriver.scala：217）在org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse（ParseDriver.scala：114）在org.apache.spark.sql.execution.SparkSqlParser.parse（SparkSqlParser.scala：48）在org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression（ParseDriver.scala：43）在org.apache.spark.sql.Dataset.filter（Dataset.scala：1315）

某处它表示错误意味着数据类型不匹配。我尝试将toString应用于日期转换的输出而没有运气。

Answer 1

您可以使用火花数据框功能。

scala> val df = Seq(("1", "1523937600000"), ("2", "1523941200000"),("3","1524024000000")).toDF("id", "unix")
df: org.apache.spark.sql.DataFrame = [id: string, unix: string]

scala> df.filter($"unix" > unix_timestamp()*1000).collect()
res5: Array[org.apache.spark.sql.Row] = Array([3,1524024000000])
scala> df.withColumn("unixinEST"
                        ,from_utc_timestamp(
                            from_unixtime(unix_timestamp()),
                             "EST"))
         .show()
+---+-------------+-------------------+
| id|         unix|          unixinEST|
+---+-------------+-------------------+
|  1|1523937600000|2018-04-18 06:13:19|
|  2|1523941200000|2018-04-18 06:13:19|
|  3|1524024000000|2018-04-18 06:13:19|
+---+-------------+-------------------+

Answer 2

您的意见很有帮助@ vijay-krishna。我的理解是错误意味着我只能通过HIVE SQL函数。我使用unix_timestamd（字符串，格式）来解决错误并且没有经过时区转换。

在输入中没有可行的选择＆＃39;对于spark sql？

2 个答案: