Spark-shell路径未指定异常

时间:2017-04-27 00:50:33

标签: scala apache-spark streaming apache-kafka

我在linux上使用2.1的spark shell。

./bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0

Spark shell启动没有任何问题。

val ds1 = spark.readStream.option("kafka.bootstrap.servers", "xx.xx.xxx.xxx:9092,xx.xx.xxx.xxx:9092").option("subscribe", "MickyMouse").load()

我收到以下异常

java.lang.IllegalArgumentException: 'path' is not specified
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:205)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:205)
  at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
  at org.apache.spark.sql.catalyst.util.CaseInsensitiveMap.getOrElse(CaseInsensitiveMap.scala:23)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:204)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
  at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)

The kafka server is up and running.

任何想法我如何成功地从kafka源读取。

2 个答案:

答案 0 :(得分:0)

您忘记调用format方法。默认格式为parquet。这就是它寻找路径的原因。将代码更改为spark.readStream.format("kafka").option...可以解决此问题。

答案 1 :(得分:0)

这应该可以解决问题:

let myRequest = NSMutableURLRequest(url: URL(string: "https://news.ycombinator.com/item?id=19722704")!)

let dataTask : URLSessionTask = URLSession.shared.dataTask(with: myRequest as URLRequest, completionHandler: { data, response, error in

    guard error == nil else {
        return
    }

    guard let data = data else {
        return
    }

    if let htmlString = String(bytes: data, encoding: String.Encoding.utf8), let doc = try? HTML(html: htmlString, encoding: .utf8) {

        for postDescription in doc.xpath("//*[@id=\"hnmain\"]//tr[3]/td/table[1]//tr[4]/td[2]") {
            print("postDescription: \(String(describing: postDescription.content))")
        }

        for comment in doc.xpath("//table[@class=\"comment-tree\"]//tr") {
            print("Comment: \(String(describing: comment.content))")
        }
    }

})
dataTask.resume()

我知道现在做出回应已经太迟了,但可能会帮助一些遇到类似问题的人。