由于数据量巨大,我试图从服务器加载数据以检查需要多少时间才能使用spark来获取数据,但是我无法获取。 谁能帮我吗?? 谢谢
object sparkdb extends App with Context {
val jdbcDF = sparkSession.read
.format("jdbc")
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://finp01/fbb")
.option("dbtable", "fpaint.fpa_re")
.option("user", "postgres")
.option("password", "password")
.load()
jdbcDF.createOrReplaceTempView("fpa_re")
jdbcDF.printSchema()
// 1st what i tried
val data = sparkSession.sql("select * from fpaint.fpa_re")
data.show()
//2nd what i tried
jdbcDF.select().show
name := "Self_test"
build.sbt
version := "0.1"
scalaVersion := "2.12.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.4.3"
libraryDependencies += "org.postgresql" % "postgresql" % "42.2.5"
它连续运行,没有任何输出。
19/06/14 13:49:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on (size: 4.6 KB, free: 1988.7 MB)
19/06/14 13:49:00 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/06/14 13:49:00 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at show at sparkdb.scala:27)
(first 15 tasks are for partitions Vector(0))
19/06/14 13:49:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
19/06/14 13:49:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0
(TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7169 bytes)
19/06/14 13:49:00 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)`