我想将SQL表与pyspark一起使用。我可以看到这样的sparkDF;
sparkDF
DataFrame[SPRAS: string, PRCTR: string, DATBI: string, KOKRS: string, KTEXT: string, LTEXT: string, MCTXT: string]
但是当我sparkDF.show()时出现错误。这是我的代码;
import pyodbc
import pandas
import os
import findspark as fs
fs.init()
os.environ["JAVA_HOME"] = "C:/Program Files/Java/jdk1.8.0_181"
os.environ["PYTHONPATH"]=os.environ["PYTHON"]
from pyspark import SparkContext, SparkConf
conf = SparkConf().set("spark.cores.max", 3).set("spark.executor.memory", "3G").setAppName('hello').setMaster(
'spark://xx.xx.xx.xx:7077')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession
conn = pyodbc.connect('Driver={SQL Server};'
'Server=xx.xx.xx.xx;'
'Database=DEP;'
'Trusted_Connection=yes;')
sql = 'SELECT * FROM dep.xxxx'
pdf = pandas.read_sql(sql, conn)
rdd=sc.parallelize(pdf)
sparkDF = spark.createDataFrame(rdd)
sparkDF.show()
sparkDF.createOrReplaceTempView("sparkDF")
sqlDF = spark.sql("SELECT * FROM sparkDF")
sqlDF.show()
结果;
9/11/21 09:42:45 WARN TaskSetManager: Lost task 0.0 in stage 12.0 (TID 63, xx.xx.xx.xx, executor 2): java.io.IOException: Cannot run program "C:\ProgramData\Anaconda3\envs\deneme\python.exe": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
......
您对此有任何想法吗? (我是PYTHON世界的新手)