如何将odbc软件包安装到Databricks集群?

时间:2019-04-04 14:04:51

标签: r odbc azure-databricks

我需要从Databricks中的R笔记本访问Azure SQL数据库。为此,我旨在使用odbc软件包,该软件包可以在我的本地R实例上很好地安装。

我尝试使用Databricks的界面将软件包安装到群集,但始终失败。我还在笔记本中尝试了以下代码:

install.packages("odbc")

结果为:

Installing package into ‘/databricks/spark/R/lib’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/odbc_1.1.6.tar.gz'
Content type 'application/x-gzip' length 288033 bytes (281 KB)
==================================================
downloaded 281 KB

* installing *source* package ‘odbc’ ...
** package ‘odbc’ successfully unpacked and MD5 sums checked
PKG_CFLAGS=
PKG_LIBS=-lodbc
<stdin>:1:17: fatal error: sql.h: No such file or directory
compilation terminated.
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because odbc was not found. Try installing:
 * deb: unixodbc-dev (Debian, Ubuntu, etc)
 * rpm: unixODBC-devel (Fedora, CentOS, RHEL)
 * csw: unixodbc_dev (Solaris)
 * brew: unixodbc (Mac OSX)
To use a custom odbc set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘odbc’
* removing ‘/databricks/spark/R/lib/odbc’

The downloaded source packages are in
    ‘/tmp/RtmpqHp2QM/downloaded_packages’

我也尝试过从github安装:

library(devtools)
devtools::install_github("r-dbi/odbc")

哪个给出了另一个错误:

Downloading GitHub repo r-dbi/odbc@master
Installing 3 packages: assertthat, BH, Rcpp
Installing packages into ‘/databricks/spark/R/lib’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/assertthat_0.2.1.tar.gz'
Content type 'application/x-gzip' length 12742 bytes (12 KB)
==================================================
downloaded 12 KB

trying URL 'https://cloud.r-project.org/src/contrib/BH_1.69.0-1.tar.gz'
Content type 'application/x-gzip' length 12378154 bytes (11.8 MB)
==================================================
downloaded 11.8 MB

trying URL 'https://cloud.r-project.org/src/contrib/Rcpp_1.0.1.tar.gz'
Content type 'application/x-gzip' length 3661123 bytes (3.5 MB)
==================================================
downloaded 3.5 MB

* installing *source* package ‘assertthat’ ...
** package ‘assertthat’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (assertthat)
* installing *source* package ‘BH’ ...
** package ‘BH’ successfully unpacked and MD5 sums checked
** inst
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (BH)
* installing *source* package ‘Rcpp’ ...
** package ‘Rcpp’ successfully unpacked and MD5 sums checked
** libs
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include/     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Date.cpp -o Date.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include/     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Module.cpp -o Module.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include/     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Rcpp_init.cpp -o Rcpp_init.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include/     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c api.cpp -o api.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include/     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c attributes.cpp -o attributes.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include/     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c barrier.cpp -o barrier.o
g++ -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o Rcpp.so Date.o Module.o Rcpp_init.o api.o attributes.o barrier.o -L/usr/lib/R/lib -lR
installing to /databricks/spark/R/lib/Rcpp/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (Rcpp)

The downloaded source packages are in
    ‘/tmp/RtmpqHp2QM/downloaded_packages’
Error in processx::run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout),  : 
  System command error
In addition: Warning messages:
1: In install.packages("odbc") :
  installation of package ‘odbc’ had non-zero exit status
2: In install.packages("odbc") :
  installation of package ‘odbc’ had non-zero exit status

有任何想法为什么当此软件包在本地正常运行时,以及我尝试在Databricks上安装的所有其他软件包都使用相同语法正常运行时,为什么该软件包无法在Databricks上安装?

1 个答案:

答案 0 :(得分:0)

访问SQL数据库的最佳选项是使用预安装的JDBC连接(请参见Documentation)。 如果要使用ODBC,则需要(如注释之一所述)unix odbc。最好使用init-scripts安装多个软件包。以下python代码用于为pyodbc安装创建初始化脚本。

script = """
  sudo apt-get -q -y install unixodbc unixodbc-dev
  sudo apt-get -q -y install python3-dev
  sudo pip install pyodbc
  curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
  sudo curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
  sudo apt-get update
  sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql
"""

dbutils.fs.put("/databricks/init/pyodbc/pyodbc.sh", script, True)

希望这会有所帮助。