我正在尝试在jupyter笔记本中使用pyspark内核访问配置单元表。我能够实例化spark会话,但无法连接到配置单元metastore,因此无法访问我的数据库。
我能够使用spark外壳访问数据库,但是我也需要在jupyter笔记本中进行访问。
我已经尝试在内核文件的PYTHON_SUBMIT_ARGS中提及节俭服务器地址。
Jupyter内核文件:
{
"display_name": "PySpark2",
"language": "python",
"argv": [
"/opt/anaconda3/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"JAVA_HOME": "/usr/java/jdk1.8.0_191-amd64",
"HADOOP_CONF_DIR": "/etc/hadoop/conf",
"HADOOP_CONF_LIB_NATIVE_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib/hadoop/lib/native",
"YARN_CONF_DIR": "/etc/hadoop/conf",
"PYTHONPATH": "/opt/anaconda3/bin/python3.7:/opt/anaconda3/lib/python3.7/site-packages:/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/python:/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/python/lib/py4j-0.10.7-src.zip",
"SPARK_HOME": "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/",
"PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera1-1.cdh5.13.3.p0.1007356/lib/spark2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client pyspark-shell "
}
}
据我了解,当初始化spark会话时,正在创建本地配置单元metastore。
我需要了解我的jupyter内核文件或其他属性文件中所需的更改,以便spark会话可以访问配置单元metastore。配置单元metastore在单独的节点上。