Flume HDFS的问题从Twitter下沉

时间:2014-09-06 11:03:06

标签: twitter hdfs flume sink

我目前在Flume中有这个配置:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'TwitterAgent'
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = YPTxqtRamIZ1bnJXYwGW
TwitterAgent.sources.Twitter.consumerSecret = Wjyw9714OBzao7dktH0csuTByk4iLG9Zu4ddtI6s0ho
TwitterAgent.sources.Twitter.accessToken = 2340010790-KhWiNLt63GuZ6QZNYuPMJtaMVjLFpiMP4A2v
TwitterAgent.sources.Twitter.accessTokenSecret = x1pVVuyxfvaTbPoKvXqh2r5xUA6tf9einoByLIL8rar
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

twitter app auth键是正确的。 我一直在水槽日志文件中收到此错误:

ERROR   org.apache.flume.SinkRunner     

Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:446)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2310)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2344)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2326)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:227)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:589)
    at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:161)
    at org.apache.flume.sink.hdfs.BucketWriter.access$800(BucketWriter.java:57)
    at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:586)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    ... 1 more
Caused by: java.net.UnknownHostException: hadoop1
    ... 23 more

这里有人知道为什么,可以向我解释一下吗? 提前谢谢。

1 个答案:

答案 0 :(得分:0)

根据例外情况,问题是主机hadoop1未知。

根据水槽配置文件,您提供的路径是

hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/

应该可以使用水槽剂从机器上访问。由于机器名称不能用于访问HDFS而不在同一域中,因此您需要使用core-site.xml

中设置的IP地址访问HDFS