在Azure Databricks上安装rgdal和rgeos

时间:2019-02-24 20:35:06

标签: r azure gdal databricks geos

我无法在Databricks上安装rgdal和rgeos,有什么建议吗?

configure: error: gdal-config not found or not executable.
ERROR: configuration failed for package ‘rgdal’
* removing ‘/databricks/spark/R/lib/rgdal’

configure: error: geos-config not found or not executable.
ERROR: configuration failed for package ‘rgeos’
* removing ‘/databricks/spark/R/lib/rgeos’

1 个答案:

答案 0 :(得分:1)

这是在Azure Databricks的R上安装rgdal和rgeos的一种方法。每次启动集群时,都需要执行步骤1和2。步骤1可以自动执行(请参见下文),但是步骤2需要在单独的脚本中手动执行或添加到R脚本的顶部。

步骤1

您需要首先在集群中的linux机器上安装gdal和geos。这可以通过databricks笔记本中的bash脚本来完成。 %s是允许该单元格运行Shell脚本的魔术命令。

%sh
#!/bin/bash

#Start by updating everything
sudo apt-get update

##############
#### rgdal

#This installs gdal on the linux machine but not the R library (done in R script)
#See https://databricks.com/notebooks/rasterframes-notebook.html
sudo apt-get install -y gdal-bin libgdal-dev

#To be able to install the R library, you also need libproj-dev 
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt-get install -y libproj-dev 

##############
#### rgeos

#This installs geos on the linux machine but not the R library (done in R script)
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt install libgeos++dev

但是,每次必须手动运行很烦人,因此您可以创建一个在每次启动集群时都运行的初始化脚本。因此,在数据砖 python 笔记本中,将此代码复制到单元格中。 dbfs:/databricks/init/<name_of_cluster>中的脚本将在具有该名称的集群的启动时运行。

#This file creates a bash script called install_packages.sh. The cluster run this file on each startup.
# The bash script will be anything inside the variable script 

clusterName = "RStudioCluster"
script = """#!/bin/bash

#Start by updating everything
sudo apt-get update

##############
#### rgdal

#This installs gdal on the linux machine but not the R library (done in R script)
#See https://databricks.com/notebooks/rasterframes-notebook.html
sudo apt-get install -y gdal-bin libgdal-dev

#To be able to install the R library, you also need libproj-dev 
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt-get install -y libproj-dev 

##############
#### rgeos

#This installs geos on the linux machine but not the R library (done in R script)
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt install libgeos++dev

"""
dbutils.fs.put("dbfs:/databricks/init/%s/install_packages.sh" % clusterName, script, True)

步骤2

到目前为止,您刚刚在集群中的linux机器上安装了gdal和geos。在此步骤中,您将安装R软件包rgdal。但是,rgdal的最新版本与gdal可用的apt-get的最新版本不兼容。有关更多详细信息和解决此问题的替代方法,请参见here,但是如果您对rgdal的旧版本没问题,那么最简单的解决方法是安装rgdal的1.2-20版本。您可以在databricks R 笔记本或Rstudio databricks应用程序中执行以下操作:

require(devtools)
install_version("rgdal", version="1.2-20")
install.packages("rgeos")

设置完成

然后,您可以像平常一样导入这些库:

library(rgdal)
library(rgeos)