Jupyter Notebook中的Pyspark内核-无法连接到远程Hive Metastore

时间:2019-07-08 11:17:10

标签: hive pyspark cloudera-cdh hive-metastore jupyter-kernel

我正在尝试在jupyter笔记本中使用pyspark内核访问配置单元表。我能够实例化spark会话,但无法连接到配置单元metastore,因此无法访问我的数据库。

我能够使用spark外壳访问数据库,但是我也需要在jupyter笔记本中进行访问。

我已经尝试在内核文件的PYTHON_SUBMIT_ARGS中提及节俭服务器地址。

Jupyter内核文件:

{
"display_name": "PySpark2",
 "language": "python",
 "argv": [
  "/opt/anaconda3/bin/python",
  "-m",
  "ipykernel",
  "-f",
  "{connection_file}"
 ],
 "env": {
  "JAVA_HOME": "/usr/java/jdk1.8.0_191-amd64",
  "HADOOP_CONF_DIR": "/etc/hadoop/conf",
  "HADOOP_CONF_LIB_NATIVE_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib/hadoop/lib/native",
  "YARN_CONF_DIR": "/etc/hadoop/conf",
  "PYTHONPATH": "/opt/anaconda3/bin/python3.7:/opt/anaconda3/lib/python3.7/site-packages:/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/python:/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/python/lib/py4j-0.10.7-src.zip",
  "SPARK_HOME": "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/",
  "PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/python/pyspark/shell.py",
  "PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client pyspark-shell "
 }
}

以下是Spark会话日志中的消息:

INFO internal.SharedState:将hive.metastore.warehouse.dir(“空”)设置为spark.sql.warehouse.dir(“文件:/ home / cda353 / workspace / notebooks / spark-warehouse”的值)

据我了解,当初始化spark会话时,正在创建本地配置单元metastore。

我需要了解我的jupyter内核文件或其他属性文件中所需的更改,以便spark会话可以访问配置单元metastore。配置单元metastore在单独的节点上。

0 个答案:

没有答案