Question

我需要开发一个Spark应用程序，它将使用流式传输来创建RDDs，然后将DStream的RDD转换为DataFrame并运行SQL查询，这类似这个Spark example

挑战在于，我需要运行多个SQL，而不是运行一个SQL。我不确定DataFrame是否是线程安全的所以我可以创建带有SQLContext的线程，并且可以并行运行查询和'action'，或者有一种方法我可以在Spark中更清洁。以下我如何按顺序执行此操作，但我想同时执行此操作。

    DataFrame wordsDataFrame = sqlContext.createDataFrame(rowRDD, JavaRecord.class); 


     // Register as table 
     wordsDataFrame.registerTempTable("words"); 

     // Do word count on table using SQL and print it 
     DataFrame wordCountsDataFrame = 
         sqlContext.sql("select word, count(*) as total from words group by word"); 
     System.out.println("========= " + time + "========="); 

     // Do word count AGAIN on table using SQL and print it 
     DataFrame wordCountsDataFrame2 = 
         sqlContext.sql("select word, count(*) as total from words where len(word) > 10 group by word"); 
     System.out.println("========= " + time + "========="); 
     wordCountsDataFrame2.show();

同时在同一个SparkContext上使用多个SQL

0 个答案: