无法从Pyspark

时间:2018-04-05 17:01:33

标签: python pyspark amazon-redshift

我正在运行带有火花的jupyter笔记本。

我尝试过下面的这个脚本

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

sql_context = SQLContext(sc)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "xxxx")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "uyuu")
df = sql_context.read \
    .format("com.databricks.spark.redshift") \
    .option("url", "jdbc:redshift://xxxx") \
    .option("dbtable", "index") \
    .option('forward_spark_s3_credentials',True) \
    .option("tempdir", "s3n://xxxx/temp") \
    .load()

print(df)

它工作并且能够在redshift中打印我的数据的列详细信息。但是,我想访问数据,因此df.show()。

但是,此错误又返回给我

Py4JJavaError: An error occurred while calling o53.showString.
: java.lang.NoClassDefFoundError: com/eclipsesource/json/Json

我可以知道我做错了什么吗?

PS:这些是我采取的步骤:

  1. 我已将这些jar文件放入我的hadoop spark jar文件夹中:
  2. 火花redshift_2.10-3.0.0-preview1.jar RedshiftJDBC41-1.1.10.1010.jar Hadoop的AWS-2.7.1.jar AWS-Java的SDK-1.7.4.jar AWS-java的SDK-s3-1.11.60.jar

    1. 我跑了pyspaark,启动了jupyter notebook localhost
    2. 我运行了脚本aboe
    3. 错误跟踪日志

      ---> 10打印(df.show())

      C:\ Spark \ spark-2.3.0-bin-hadoop2.7 \ python \ pyspark \ sql \ dataframe.py in show(self,n,truncate,vertical)     348"""     349如果isinstance(truncate,bool)和truncate: - > 350打印(self._jdf.showString(n,20,vertical))     351其他:     352 print(self._jdf.showString(n,int(truncate),vertical))

      调用中的

      C:\ Spark \ spark-2.3.0-bin-hadoop2.7 \ python \ lib \ py4j-0.10.6-src.zip \ py4j \ java_gateway.py(self ,* args)    1158回答= self.gateway_client.send_command(命令)    1159 return_value = get_return_value( - > 1160回答,self.gateway_client,self.target_id,self.name)    1161    1162对于temp_args中的temp_arg:

      C:\ Spark \ spark-2.3.0-bin-hadoop2.7 \ python \ pyspark \ sql \ utils.py in deco(* a,** kw)      61 def deco(* a,** kw):      62尝试: ---> 63返回f(* a,** kw)      64除了py4j.protocol.Py4JJavaError为e:      65 s = e.java_exception.toString()

      get_return_value中的

      C:\ Spark \ spark-2.3.0-bin-hadoop2.7 \ python \ lib \ py4j-0.10.6-src.zip \ py4j \ protocol.py(answer,gateway_client,target_id,name)     318提出Py4JJavaError(     319"调用{0} {1} {2}时发生错误。\ n"。 - > 320格式(target_id,"。",名称),值)     321其他:     322提出Py4JError(

      Py4JJavaError:调用o53.showString时发生错误。 :java.lang.NoClassDefFoundError:com / eclipsesource / json / Json     在com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:150)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy $$ anonfun $ 10.apply(DataSourceStrategy.scala:293)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy $$ anonfun $ 10.apply(DataSourceStrategy.scala:293)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy $$ anonfun $ pruneFilterProject $ 1.apply(DataSourceStrategy.scala:338)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy $$ anonfun $ pruneFilterProject $ 1.apply(DataSourceStrategy.scala:337)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:415)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:333)     在org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:63)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:63)     在scala.collection.Iterator $$ anon $ 12.nextCur(Iterator.scala:434)     在scala.collection.Iterator $$ anon $ 12.hasNext(Iterator.scala:440)     在scala.collection.Iterator $$ anon $ 12.hasNext(Iterator.scala:439)     在org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 2 $$ anonfun $ apply $ 2.apply(QueryPlanner.scala:78)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 2 $$ anonfun $ apply $ 2.apply(QueryPlanner.scala:75)     在scala.collection.TraversableOnce $$ anonfun $ foldLeft $ 1.apply(TraversableOnce.scala:157)     在scala.collection.TraversableOnce $$ anonfun $ foldLeft $ 1.apply(TraversableOnce.scala:157)     在scala.collection.Iterator $ class.foreach(Iterator.scala:893)     在scala.collection.AbstractIterator.foreach(Iterator.scala:1336)

0 个答案:

没有答案