R和Hive之间的连接

时间:2014-10-09 10:47:44

标签: r hive

我正在尝试在RStudio(在我的机器上)和Hive(在不同的服务器上设置)之间建立连接。这是我的R代码:

install.packages("RJDBC",dep=TRUE)
require(RJDBC)

drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver", 
       classPath = list.files("C:/Users/37/Downloads/hive-jdbc-0.10.0.jar",
       pattern="jar$",full.names=T),
       identifier.quote="'")

以下是执行上述命令时出现的错误:

  

.jfindClass中的错误(as.character(driverClass)1):找不到类   conn&lt; - dbConnect(drv,&#34; jdbc:hive2://65.11.23.453:10000 / default&#34;,&#34; admin&#34;,&#34; admin&#34;)

我从here下载了jar文件,并将它们放在 CLASSPATH 中。请告知我是否做错了什么以及如何让它发挥作用。

感谢。

6 个答案:

答案 0 :(得分:1)

如果您有cloudera,请检查版本并下载jar。 例 CDH 5.9.1 hadoop-common-2.6.0-cdh5.9.1.jar hive-jdbc-1.1.1-standalone.jar

将罐子复制到R主机的文件夹中并执行:

long maxSize = new ObjectMapper()
    .readTree(json)
    .get("environments").get("DEV").get("maxSize").asLong();

答案 1 :(得分:0)

我尝试了这个示例代码,它对我有用:

library(RJDBC)

#Load Hive JDBC driver
hivedrv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
                c(list.files("/home/amar/hadoop/hadoop",pattern="jar$",full.names=T),
                  list.files("/home/amar/hadoop/hive/lib",pattern="jar$",full.names=T)))

#Connect to Hive service
hivecon <- dbConnect(hivedrv, "jdbc:hive://ip:port/default")
query = "select * from mytable LIMIT 10"
hres <- dbGetQuery(hivecon, query)

答案 2 :(得分:0)

当我尝试使用RJDBC连接到Cassandra时,我发生了同样的错误,它通过将Cassandra JDBC依赖项放在JAVA ClassPath中来解决。

请参阅此answer

答案 3 :(得分:0)

对于发现此帖子的人,您可以尝试解决以下问题:

1。)从源 Traceback (most recent call last): File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 118, in fetch_config p = subprocess.Popen([command, args], stdout=subprocess.PIPE) File "/usr/lib64/python3.4/subprocess.py", line 859, in __init__ restore_signals, start_new_session) File "/usr/lib64/python3.4/subprocess.py", line 1457, in _execute_child raise child_exception_type(errno_num, err_msg) FileNotFoundError: [Errno 2] No such file or directory: '../../apps/gdal-config' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 166, in get_gdal_config return fetch_config(option, gdal_config = self.gdal_config) File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 122, in fetch_config raise gdal_config_error(e) __main__.gdal_config_error: [Errno 2] No such file or directory: '../../apps/gdal-config' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 118, in fetch_config p = subprocess.Popen([command, args], stdout=subprocess.PIPE) File "/usr/lib64/python3.4/subprocess.py", line 859, in __init__ restore_signals, start_new_session) File "/usr/lib64/python3.4/subprocess.py", line 1457, in _execute_child raise child_exception_type(errno_num, err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'gdal-config' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 20, in <module> File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 320, in <module> **extra ) File "/usr/lib64/python3.4/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib64/python3.4/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/usr/lib64/python3.4/distutils/dist.py", line 974, in run_command cmd_obj.run() File "<string>", line 15, in replacement_run File "/opt/python/run/venv/local/lib/python3.4/site-packages/setuptools/command/egg_info.py", line 207, in find_sources mm.run() File "/opt/python/run/venv/local/lib/python3.4/site-packages/setuptools/command/egg_info.py", line 291, in run self.add_defaults() File "/opt/python/run/venv/local/lib/python3.4/site-packages/setuptools/command/egg_info.py", line 320, in add_defaults sdist.add_defaults(self) File "/opt/python/run/venv/local/lib/python3.4/site-packages/setuptools/command/sdist.py", line 130, in add_defaults build_ext = self.get_finalized_command('build_ext') File "/usr/lib64/python3.4/distutils/cmd.py", line 299, in get_finalized_command cmd_obj.ensure_finalized() File "/usr/lib64/python3.4/distutils/cmd.py", line 107, in ensure_finalized self.finalize_options() File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 195, in finalize_options self.gdaldir = self.get_gdal_config('prefix') File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 175, in get_gdal_config return fetch_config(option) File "/tmp/pip-build-zju5440p/GDAL/setup.py", line 122, in fetch_config raise gdal_config_error(e) __main__.gdal_config_error: [Errno 2] No such file or directory: 'gdal-config' ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-zju5440p/GDAL You are using pip version 7.1.2, however version 9.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. 2017-02-01 06:13:28,277 ERROR Error installing dependencies: Command '/opt/python/run/venv/bin/pip install -r /opt/python/ondeck/app/requirements.txt' returned non-zero exit status 1 Traceback (most recent call last): File "/opt/elasticbeanstalk/hooks/appdeploy/pre/03deploy.py", line 22, in main install_dependencies() File "/opt/elasticbeanstalk/hooks/appdeploy/pre/03deploy.py", line 18, in install_dependencies check_call('%s install -r %s' % (os.path.join(APP_VIRTUAL_ENV, 'bin', 'pip'), requirements_file), shell=True) File "/usr/lib64/python2.7/subprocess.py", line 541, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '/opt/python/run/venv/bin/pip install -r /opt/python/ondeck/app/requirements.txt' returned non-zero exit status 1 (Executor::NonZeroExitStatus) 重新安装rJava

2。)启动java调试器进行加载并再次尝试连接 install.packages("rJava","http://rforge.net/",type="source")

3。)我之前必须同时使用.jclassLoader()$setDebug(1L)并利用Sys.setenv(JAVA_HOME = /Path/to/java)找到合适的jvm库。

4。)如上所述rJava load error in RStudio/R after "upgrading" to OSX Yosemite,您还可以在dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/server/libjvm.dylib')libjvm.dylib

之间创建链接

/usr/local/lib

如果所有这些都失败了,那么卸载和安装R对我来说也是有用的。

答案 4 :(得分:0)

到目前为止,这对我有所帮助。

1)首先检查配置单元服务是否正在运行,如果没有重新启动它。

sudo service hive-server2 status
sudo service hive-server2 restart

2)安装rJava和RJDBCin R。

library(rJava)
library(RJDBC)

options(java.parameters = '-Xmx8g')
hadoop_jar_dirs <- c('/usr/lib/hadoop/lib',
                     '/usr/lib/hadoop',
                     '/usr/lib/hive/lib')
clpath <- c()
for (d in hadoop_jar_dirs) {
  clpath <- c(clpath, list.files(d, pattern = 'jar', full.names = TRUE))
}
.jinit(classpath = clpath)
.jaddClassPath(clpath)

hive_jdbc_jar <- '/usr/lib/hive/lib/hive-jdbc-2.1.1.jar'
hive_driver <- 'org.apache.hive.jdbc.HiveDriver'
hive_url <- 'jdbc:hive2://localhost:10000/default'
drv <- JDBC(hive_driver, hive_jdbc_jar)
conn <- dbConnect(drv, hive_url)
show_databases <- dbGetQuery(conn, "show databases")

show_databases

确保提供 hadoop_jar_dirs,hive_jdbc_jar和hive_driver的正确路径。

答案 5 :(得分:0)

我编写了一个用于处理此(和kerberos)的软件包:

devtools::install_github('nfultz/hiveuberjar')

require(DBI)
con <- dbConnect(hiveuberjar::HiveUber(), url="jdbc://host:port/schema")
dbListTables(con)
dbGetQuery(con, "Select count(*) from nfultz.iris limit 10")