为什么我无法从Spark中的val jdbcDF获取所有行?

时间:2019-06-14 09:02:17

标签: postgresql scala apache-spark apache-spark-sql

由于数据量巨大,我试图从服务器加载数据以检查需要多少时间才能使用spark来获取数据,但是我无法获取。 谁能帮我吗?? 谢谢

object sparkdb extends App with Context {
      val jdbcDF = sparkSession.read
        .format("jdbc")
        .option("driver", "org.postgresql.Driver")
        .option("url", "jdbc:postgresql://finp01/fbb")
        .option("dbtable", "fpaint.fpa_re")
        .option("user", "postgres")
        .option("password", "password")
        .load()
      jdbcDF.createOrReplaceTempView("fpa_re")
      jdbcDF.printSchema()
    // 1st what i tried
     val data = sparkSession.sql("select * from fpaint.fpa_re")
      data.show()
    //2nd what i tried
    jdbcDF.select().show
name := "Self_test"

build.sbt

version := "0.1"

scalaVersion := "2.12.0"

// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.4.3"
libraryDependencies += "org.postgresql" % "postgresql" % "42.2.5"

它连续运行,没有任何输出。

19/06/14 13:49:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on  (size: 4.6 KB, free: 1988.7 MB)
    19/06/14 13:49:00 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
    19/06/14 13:49:00 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at show at sparkdb.scala:27) 
    (first 15 tasks are for partitions Vector(0))
    19/06/14 13:49:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
    19/06/14 13:49:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0 
    (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7169 bytes)
    19/06/14 13:49:00 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)`

0 个答案:

没有答案