我无法在Databricks上安装rgdal和rgeos,有什么建议吗?
configure: error: gdal-config not found or not executable.
ERROR: configuration failed for package ‘rgdal’
* removing ‘/databricks/spark/R/lib/rgdal’
configure: error: geos-config not found or not executable.
ERROR: configuration failed for package ‘rgeos’
* removing ‘/databricks/spark/R/lib/rgeos’
答案 0 :(得分:1)
这是在Azure Databricks的R上安装rgdal和rgeos的一种方法。每次启动集群时,都需要执行步骤1和2。步骤1可以自动执行(请参见下文),但是步骤2需要在单独的脚本中手动执行或添加到R脚本的顶部。
您需要首先在集群中的linux机器上安装gdal和geos。这可以通过databricks笔记本中的bash脚本来完成。 %s
是允许该单元格运行Shell脚本的魔术命令。
%sh
#!/bin/bash
#Start by updating everything
sudo apt-get update
##############
#### rgdal
#This installs gdal on the linux machine but not the R library (done in R script)
#See https://databricks.com/notebooks/rasterframes-notebook.html
sudo apt-get install -y gdal-bin libgdal-dev
#To be able to install the R library, you also need libproj-dev
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt-get install -y libproj-dev
##############
#### rgeos
#This installs geos on the linux machine but not the R library (done in R script)
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt install libgeos++dev
但是,每次必须手动运行很烦人,因此您可以创建一个在每次启动集群时都运行的初始化脚本。因此,在数据砖 python 笔记本中,将此代码复制到单元格中。 dbfs:/databricks/init/<name_of_cluster>
中的脚本将在具有该名称的集群的启动时运行。
#This file creates a bash script called install_packages.sh. The cluster run this file on each startup.
# The bash script will be anything inside the variable script
clusterName = "RStudioCluster"
script = """#!/bin/bash
#Start by updating everything
sudo apt-get update
##############
#### rgdal
#This installs gdal on the linux machine but not the R library (done in R script)
#See https://databricks.com/notebooks/rasterframes-notebook.html
sudo apt-get install -y gdal-bin libgdal-dev
#To be able to install the R library, you also need libproj-dev
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt-get install -y libproj-dev
##############
#### rgeos
#This installs geos on the linux machine but not the R library (done in R script)
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt install libgeos++dev
"""
dbutils.fs.put("dbfs:/databricks/init/%s/install_packages.sh" % clusterName, script, True)
到目前为止,您刚刚在集群中的linux机器上安装了gdal和geos。在此步骤中,您将安装R软件包rgdal
。但是,rgdal
的最新版本与gdal
可用的apt-get
的最新版本不兼容。有关更多详细信息和解决此问题的替代方法,请参见here,但是如果您对rgdal
的旧版本没问题,那么最简单的解决方法是安装rgdal
的1.2-20版本。您可以在databricks R 笔记本或Rstudio databricks应用程序中执行以下操作:
require(devtools)
install_version("rgdal", version="1.2-20")
install.packages("rgeos")
然后,您可以像平常一样导入这些库:
library(rgdal)
library(rgeos)