如何在flink中使用HDFS接收器?

时间:2017-10-15 20:51:21

标签: java hadoop hdfs apache-flink flink-streaming

我在apache flink中使用twitter连接器,现在想在我的本地hdfs实例中保存一些流数据。在flink文档中有一个小的BucketerSink示例,但我的程序总是退出并出现以下错误:

Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:92)
at org.apache.hadoop.security.Groups.<init>(Groups.java:76)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:239)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:232)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:718)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:703)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:605)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2473)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2465)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2331)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:418)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:352)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:177)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:159)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:105)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:251)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:678)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:666)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 1
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3116)
at java.base/java.lang.String.substring(String.java:1885)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:49)
... 25 more

您对我的代码出了什么问题有什么想法吗?我使用inital twitter连接器示例进行测试,我的环境是用hdfs的docker容器构建的。端口从docker正确映射到我的本地机器,我也可以检查web ui上hdfs的状态。

这是我的代码方法:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);

    Properties props = new Properties();
    props.setProperty(TwitterSource.CONSUMER_KEY, "KEY");
    props.setProperty(TwitterSource.CONSUMER_SECRET, "SECRET");
    props.setProperty(TwitterSource.TOKEN, "TOKEN");
    props.setProperty(TwitterSource.TOKEN_SECRET, "TOKENSECRET");
    DataStream<String> streamSource = env.addSource(new TwitterSource(props));

    DataStream<Tuple2<String, Integer>> tweets = streamSource
            // selecting English tweets and splitting to (word, 1)
            .flatMap(new SelectGermanAndTokenizeFlatMap())
            // group by words and sum their occurrences
            .keyBy(0)
            .timeWindow(Time.minutes(1), Time.seconds(30))
            .sum(1);


    BucketingSink<Tuple2<String, Integer>> sink = new BucketingSink<>("hdfs://localhost:8020/flink/twitter-test");
    sink.setBucketer(new DateTimeBucketer<Tuple2<String, Integer>>("yyyy-MM-dd--HHmm"));
    sink.setBatchSize(1024 * 1024 * 400);

    tweets.addSink(sink);
    //tweets.print();

    env.execute("Twitter Streaming Example");

0 个答案:

没有答案