Flume流入天气数据

时间:2015-03-18 11:29:54

标签: hadoop-streaming flume-ng

我是新手。 但我想将天气数据从任何网站流式传输到我的hdfs位置。 所以我创建了接收器,源和通道......如下所示

weather.channels= memory-channel
weather.channels.memory-channel.capacity=10000
weather.channels.memory-channel.type = memory
weather.sinks = hdfs-write
weather.sinks.hdfs-write.channel=memory-channel
 weather.sinks.hdfs-write.type = logger
 weather.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/hadoop/flume
weather.sinks.hdfs-write.rollInterval = 1200
weather.sinks.hdfs-write.hdfs.writeFormat=Text
weather.sinks.hdfs-write.hdfs.fileType=DataStream
weather.sources= Weather
weather.sources.Weather.bind =  api.openweathermap.org/data/2.5/forecast/city?id=524901&APPID=********************************
weather.sources.Weather.channels=memory-channel
weather.sources.Weather.type = netcat
weather.sources.Weather.port = 80

所以我在这里使用API​​来处理这个问题。 我还可以使用什么来传输天气数据,我可以使用哪些在线网站,或者我应该使用哪种API来配置源? 在执行flume-ng命令启动代理时,我正在关注

15/03/18 11:13:28 ERROR lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner:{
 source:org.apache.flume.source.http.HTTPSource{name:Weather,state:IDLE} } - Exception follows.
java.lang.IllegalStateException: Running HTTP Server found in 
source:Weather before I started one.Will not attempt to start.
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:189)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 
C15/03/18 11:13:31 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10
15/03/18 11:13:31 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping
15/03/18 11:13:31 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory-channel stopped

1 个答案:

答案 0 :(得分:0)

您看到的“lyfecycle”错误是导致上一次尝试启动http服务器时出错的原因。

原始错误可能是由于尝试使用非root用户绑定到特权的80端口。将端口更改为> 1024,例如8080

但是,它无法正常使用。 http或netcat源监听调用,不会获取你在bind中设置的url。

我看到两个选项:

  1. 创建一个linux守护程序,定期将wget或curl转到该url,将结果保存到文件中,然后使用假脱机源配置flume。
  2. 创建自己的Flume源,定期汇集该网址