试图让火花流从网站读取数据流,插槽是什么?

时间:2015-06-05 17:55:38

标签: hadoop apache-spark spark-streaming rdd

我正在尝试将此数据http://stream.meetup.com/2/rsvps放入火花流

它们是JSON对象,我知道这些行将是字符串,我只是希望它在我尝试JSON之前工作。

我不知道该把什么作为端口,我认为这是问题所在。

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("Spark Streaming");

JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));

JavaReceiverInputDStream<String> lines = jssc.socketTextStream("http://stream.meetup.com/2/rsvps", 80);


lines.print();

jssc.start();
jssc.awaitTermination();

这是我的错误

java.net.UnknownHostException: http://stream.meetup.com/2/rsvps
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:208)

2 个答案:

答案 0 :(得分:2)

socketTextStream不能用作http客户端。正如您所注意到的,您需要创建一个自定义接收器,一个可能的起点是基于作为meetup流数据源的一部分创建的接收器(参见https://github.com/actions/meetup-stream/blob/master/src/main/scala/receiver/MeetupReceiver.scala)。

答案 1 :(得分:0)

这是一个自定义 UrlReceiver ,它遵循自定义接收器上的spark文档:

class UrlReceiver(urlStr: String) extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {

  override def onStart() = {
    new Thread("Url Receiver") {
      override def run() = {
        val urlConnection: URLConnection = new URL(urlStr).openConnection
        val bufferedReader: BufferedReader = new BufferedReader(
          new InputStreamReader(urlConnection.getInputStream)
        )
        var msg = bufferedReader.readLine
        while (msg != null) {
          if (!msg.isEmpty) {
            store(msg)
          }
          msg = bufferedReader.readLine
        }
        bufferedReader.close()
      }
    }.start()
  }

  override def onStop() = {
    // nothing to do
  }
}

然后像这样使用它:

val lines = sc.receiverStream(new UrlReceiver("http://stream.meetup.com/2/rsvps"))