将Cassandra时间戳列转换为timeuuid

时间:2019-05-09 11:29:29

标签: scala apache-spark cassandra timeuuid

我从Kafka获取事件并将其存储到Cassandra中。解析包含字段json的{​​{1}}来为eventID, sessionID, timestamp, userID表创建列,如下所示:

Cassandra

和代码:

cassandra@cqlsh> CREATE TABLE mydata.events (
   ...     "event_date" date,
   ...     "eventID" text,
   ...     "userID" text,
   ...     timestamp timeuuid,
   ...     "sessionID" text,
   ...     "fullJson" text,
   ...     PRIMARY KEY ("event_date", timestamp, "sessionID")

我需要将case class cassandraFormat( eventID: String, sessionID: String, timeuuid: UUID, // timestamp as timeuuid userID: String, event_date: LocalDate, // YYYY-MM-dd format fullJson: String // full json from Kafka ) 列添加为timestamp。由于我是从timeuuid进行解析的,因此以这种方式从标头中提取了所有值并创建了列:

json

此部分:

 val allJson = rdd.
            map(x => {
              implicit val formats: DefaultFormats.type = org.json4s.DefaultFormats
              //use serialization default to format a Map to JSON
              (x, Serialization.write(x))
            }).
            filter(x => x._1 isDefinedAt "header").
            map(x => (x._1("header"), x._2)).
            filter(x => (x._1 isDefinedAt "userID") &&
              (x._1 isDefinedAt "eventID") &&
              (x._1 isDefinedAt "sessionID") &&
              (x._1 isDefinedAt "timestamp").
            map(x => cassFormat(x._1("eventID").toString,
              x._1("sessionID").toString,
              com.datastax.driver.core.utils.UUIDs.startOf(x._1("timestamp").toString.toLong),
              x._1("userID").toString,
              com.datastax.driver.core.LocalDate.fromMillisSinceEpoch(x._1("timestamp").toString.toLong),
              x._2))

正在产生错误

  

java.lang.NumberFormatException:对于输入字符串:   “ 2019-05-09T09:00:52.553 + 0000”   java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

甚至尝试过:            com.datastax.driver.core.utils.UUIDs.startOf(x._1("timestamp").toString.toLong) , 也产生相同的错误。 如何将java.util.UUID.fromString(x._1("timestamp").toString正确转换/转换为timestamp并通过Spark作业插入timeuuid

3 个答案:

答案 0 :(得分:2)

我已经使用UDF解决了这个问题。

import com.datastax.driver.core.utils.UUIDs
import org.apache.spark.sql.functions.udf
 
val toTimeuuid: java.sql.Timestamp => String = x => UUIDs.startOf(x.getTime()).toString()
val fromTimeuuid: String => java.sql.Timestamp = x => new java.sql.Timestamp(UUIDs.unixTimestamp(java.util.UUID.fromString(x)))
 
val toTimeuuidUDF = udf(toTimeuuid)
val fromTimeuuidUDF = udf(fromTimeuuid)

答案 1 :(得分:0)

您有一个不是数字的字符串,并且您正在尝试使用toLong将其转换为一个字符串。因此例外。

this,看来您可以使用此方法基于某个时间戳获得UUID:

public static UUID getTimeUUID(long when)

您将不得不将字符串解析为DateTimeInstant,然后将该DateTime / Instant的毫秒数传递给getTimeUUID

答案 2 :(得分:0)

我设法做到了,将timestamp格式转换为dateTimemillis,然后生成uuid

val dateTimePattern = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
val dateFormatter = DateTimeFormatter.ofPattern(dateTimePattern)

val allJson = rdd.
              map(x => {
                implicit val formats: DefaultFormats.type = org.json4s.DefaultFormats
                //use serialization default to format a Map to JSON
                (x, Serialization.write(x))
              }).
              filter(x => x._1 isDefinedAt "header").
              map(x => (x._1("header"), x._2)).
              filter(x => (x._1 isDefinedAt "userID") &&
                (x._1 isDefinedAt "eventID") &&
                (x._1 isDefinedAt "sessionID") &&
                (x._1 isDefinedAt "timestamp").
              map(x => {
                var millis: Long  = System.currentTimeMillis() // if timestamp format is invalid, put current timestamp instead
                try {
                  val dateStr: String = x._1("timestamp").asInstanceOf[String]
                  // timestamp from event json
                  // create DateTime from Timestamp string
                  val dateTime: ZonedDateTime = ZonedDateTime.parse(dateStr, dateFormatter)
                  // create millis from DateTime
                  millis = dateTime.toInstant.toEpochMilli
                } catch {
                  case e: Exception =>
                    e.printStackTrace()
                }
                // generate timeuuid
                val uuid = new UUID(UUIDs.startOf(millis).getMostSignificantBits, random.nextLong)
                // generate eventDate
                val eventDate = com.datastax.driver.core.LocalDate.fromMillisSinceEpoch(millis)
                cassFormat(x._1("eventID").toString,
                  x._1("sessionID").toString,
                  uuid,
                  x._1("userID").toString,
                  eventDate,
                  x._2)
              })
            allJson.saveToCassandra(CASSANDRA_KEYSPACE_NAME, CASSANDRA_EVENTS_TABLE)
        }
      })
cassandra中的

timestamp列现在看起来像:58976340-7313-11e9-910d-60dce7513b94