对于我的单元测试,我在笔记本电脑上使用hive上下文运行本地spark。在启动时,它为其临时文件创建两个目录,一个在/var
下,一个在/tmp
下:
... INFO SessionState: Created local directory: /var/folders/h3/...
... INFO SessionState: Created HDFS directory: /tmp/hive/<username>/...
这些文件夹由org.apache.hadoop.hive.ql.session.SessionState
类创建。
为了避免触发某些本地安全服务,我需要将这些目录重定向到另一个文件夹,例如/Users/<username>/safe/
。
如何覆盖这些默认设置以在指定路径下打开临时文件夹?
答案 0 :(得分:0)
在独立的Hive中,SessionState
有一些可配置的参数,所有参数均可从hive-site.xml
设置:
SCRATCHDIR("hive.exec.scratchdir", "/tmp/hive",
"HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. " +
"For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, " +
"with ${hive.scratch.dir.permission}."),
LOCALSCRATCHDIR("hive.exec.local.scratchdir",
"${system:java.io.tmpdir}" + File.separator + "${system:user.name}",
"Local scratch space for Hive jobs"),
DOWNLOADED_RESOURCES_DIR("hive.downloaded.resources.dir",
"${system:java.io.tmpdir}" + File.separator + "${hive.session.id}_resources",
"Temporary local directory for added resources in the remote file system."),
HIVEHISTORYFILELOC("hive.querylog.location",
"${system:java.io.tmpdir}" + File.separator + "${system:user.name}",
"Location of Hive run time structured log file")
我不熟悉Spark嵌入Hive的确切方式,但我确信有一个hive-site.xml(链接的是测试配置文件),这是控制hive.exec.scratchdir
的值, hive.exec.local.scratchdir
和其他人。