我正在尝试从IntelliJ构思运行以下代码,以便将消息从Kafka打印到控制台。但它会引发以下错误 -
Exception in thread "main" org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;
Stacktrace从Dataset.checkpoint
开始。如果我删除了.checkpoint()
,那么我会收到一些与权限相关的错误
17/08/02 12:10:52 ERROR StreamMetadata: Error writing stream metadata StreamMetadata(4e612f22-efff-4c9a-a47a-a36eb533e9d6) to C:/Users/rp/AppData/Local/Temp/temporary-2f570b97-ad16-4f00-8356-d43ccb7660db/metadata
java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\rp\AppData\Local\Temp\temporary-2f570b97-ad16-4f00-8356-d43ccb7660db\metadata
来源:
def main(args : Array[String]) = {
val spark = SparkSession.builder().appName("SparkStreaming").master("local[*]").getOrCreate()
val canonicalSchema = new StructType()
.add("cid",StringType)
.add("uid",StringType)
.add("sourceSystem",
new StructType().add("id",StringType)
.add("name",StringType))
.add("name", new StructType()
.add("firstname",StringType)
.add("lastname",StringType))
val messages = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers","localhost:9092")
.option("subscribe","c_canonical")
.option("startingOffset","earliest")
.load()
.checkpoint()
.select(from_json(col("value").cast("string"),canonicalSchema))
.writeStream.outputMode("append").format("console").start.awaitTermination
}
任何人都可以帮我理解我做错了吗?
答案 0 :(得分:1)
结构化流式传输不支持Dataset.checkpoint()
。有一张开放的票据可以提供更好的信息,或者只是忽略它:https://issues.apache.org/jira/browse/SPARK-20927
IOException可能是因为您没有在Windows上安装cygwin。