如何在flink中使用joda.time(或者如何使用typeutils.runtime.kryo)

时间:2015-11-11 14:14:23

标签: serialization jodatime kryo apache-flink

在flink项目中,我使用了一个案例类点击。

case class click( date: LocalDateTime, stbId:String, channelId :Int)

此类填充了DataSet,并且日期为java 8 java.time.LocalDateTime,效果很好。在java 7环境中切换到org.joda(version2.9)后,调用单击DataSet中的对象没有像以前那样执行。访问点击对象的日期字段的某些功能会引发NullPointerExceptions。这些函数的示例是getHourOfDay toString等。我能够确保点击类的日期字段不为空。 我怀疑joda时间库与kryo序列化不能很好地交互。见joda DateTime format cause null pointer error in spark RDD functionsNPE in spark with Joda DateTime 在Flink API中,有静态方法registerJodaTime的org.apache.flink.api.java.typeutils.runtime.kryo.Serializers。这似乎是相关的。我简单地尝试了

import  org.apache.flink.api.common._
import org.apache.flink.api.java.typeutils.runtime.kryo._
Serializers.registerJodaTime(new ExecutionConfig)

这还不够。 我是对的吗?我如何使用java.typeutils.runtime.kryo?

版本使用Flink:0.9.1。 scala:2.10 joda.time 2.9

跟进: 以下是建议的确切添加代码(感谢Fabian和Robert)

val env = ExecutionEnvironment.getExecutionEnvironment
//import  org.apache.flink.api.common._
import org.apache.flink.api.java.typeutils.runtime.kryo._
Serializers.registerJodaTime(env.getConfig)

在嵌入式执行的日志文件中,我可以找到以下相关部分:

16:44:53,998 INFO  org.apache.flink.api.java.ExecutionEnvironment                - The job has 2 registered types and 0 default Kryo serializers
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo types: 
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo with Serializers types: 
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo with Serializer Classes types: Entry{k=class org.joda.time.DateTime, v=class de.javakaffee.kryoserializers.jodatime.JodaDateTimeSerializer},Entry{k=class org.joda.time.Interval, v=class de.javakaffee.kryoserializers.jodatime.JodaIntervalSerializer}
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo default Serializers: 
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo default Serializers Classes 
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered POJO types: 
16:44:53,998 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Static code analysis mode: DISABLE
16:44:54,545 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
16:44:54,560 DEBUG akka.event.EventStream                                        - logger log1-Slf4jLogger started
....
16:44:57,103 DEBUG org.apache.flink.api.java.typeutils.TypeExtractor             - class org.joda.time.LocalDateTime does not contain a getter for field iLocalMillis
16:44:57,103 DEBUG org.apache.flink.api.java.typeutils.TypeExtractor             - class org.joda.time.LocalDateTime does not contain a setter for field iLocalMillis
16:44:57,103 INFO  org.apache.flink.api.java.typeutils.TypeExtractor                 - class org.joda.time.LocalDateTime is not a valid POJO type
16:44:57,275 DEBUG org.apache.flink.api.scala.ClosureCleaner$                        - accessedFields: Map()
16:44:57,369 INFO  org.apache.flink.api.java.ExecutionEnvironment                - The job has 2 registered types and 0 default Kryo serializers
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo types: 
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo with Serializers types: 
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo with Serializer Classes types: Entry{k=class org.joda.time.DateTime, v=class de.javakaffee.kryoserializers.jodatime.JodaDateTimeSerializer},Entry{k=class org.joda.time.Interval, v=class de.javakaffee.kryoserializers.jodatime.JodaIntervalSerializer}
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo default Serializers: 
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo default Serializers Classes 
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered POJO types: 
16:44:57,369 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Static code analysis mode: DISABLE

尽管如此,我目睹了以下内容

Exception in thread "main" java.lang.NullPointerException
    at org.joda.time.LocalDateTime.isSupported(LocalDateTime.java:625)
    at org.joda.time.format.DateTimeFormatterBuilder$PaddedNumber.printTo(DateTimeFormatterBuilder.java:1435)
    at org.joda.time.format.DateTimeFormatterBuilder$Composite.printTo(DateTimeFormatterBuilder.java:2474)
    at org.joda.time.format.DateTimeFormatter.printTo(DateTimeFormatter.java:655)
    at org.joda.time.format.DateTimeFormatter.print(DateTimeFormatter.java:709)
    at org.joda.time.LocalDateTime.toString(LocalDateTime.java:2087)
    at java.lang.String.valueOf(Unknown Source)
    at scala.runtime.StringAdd$.$plus$extension(StringAdd.scala:13)
    at myflink.click.toString(Ingestor.scala:20)
    ...

2 个答案:

答案 0 :(得分:4)

Flink正在使用Kryo来表示无法序列化的类型。 LocalDateTime就是这样一个类。

可悲的是,Kryo也无法正确序列化它,因此我们必须告诉Kryo如何通过为此类提供专门的序列化器来实现它。

  1. de.javakaffee:kryo-serializers添加为依赖项:
  2. <dependency>
        <groupId>de.javakaffee</groupId>
        <artifactId>kryo-serializers</artifactId>
        <version>0.30</version>
    </dependency>
    

    (请注意,添加此依赖项可能会在群集上使用Flink时出现问题。请告诉我们)

    1. 使用ExecutionEnvironment
    2. 注册新的序列化程序
      val env = ExecutionEnvironment.getExecutionEnvironment
      env.registerTypeWithKryoSerializer(classOf[LocalDateTime], classOf[JodaLocalDateTimeSerializer])
      

      我希望有所帮助(我将旧答案作为参考)

      在Flink中调试Kryo / Serializer问题的一般性评论:

      在本地执行作业时(也应该在./bin/flink前端工作,但输出可能在log /目录中),你应该看到类似的东西:

      14:05:52,863 INFO  org.apache.flink.api.java.ExecutionEnvironment                - The job has 15 registered types and 2 default Kryo serializers 
      14:05:52,943 INFO  org.apache.flink.runtime.minicluster.FlinkMiniCluster         - Starting FlinkMiniCluster. 
      14:05:53,150 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
      

      注册类型和Kryo​​序列化程序的数量高于0。

      使用DEBUG日志级别(将INFO替换为DEBUG中的log4j.properties),您实际上可以获得有关已注册序列化程序的更多详细信息:

      14:10:39,935 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo types: 
      14:10:39,935 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo with Serializers types: 
      14:10:39,935 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo with Serializer Classes types: 
      14:10:39,935 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo default Serializers: 
      14:10:39,935 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered Kryo default Serializers Classes 
      14:10:39,935 DEBUG org.apache.flink.api.java.ExecutionEnvironment                - Registered POJO types: 
      

答案 1 :(得分:3)

您应该在ExecutionConfig的{​​{1}}注册joda序列号:

ExecutionEnvironment

希望这有帮助。