从/ var和/ tmp目录重定向嵌入式Hive

时间:2017-05-09 06:30:56

标签: hadoop apache-spark hive

对于我的单元测试,我在笔记本电脑上使用hive上下文运行本地spark。在启动时,它为其临时文件创建两个目录,一个在/var下,一个在/tmp下:

... INFO SessionState: Created local directory: /var/folders/h3/...
... INFO SessionState: Created HDFS directory: /tmp/hive/<username>/...

这些文件夹由org.apache.hadoop.hive.ql.session.SessionState类创建。

为了避免触发某些本地安全服务,我需要将这些目录重定向到另一个文件夹,例如/Users/<username>/safe/

如何覆盖这些默认设置以在指定路径下打开临时文件夹?

1 个答案:

答案 0 :(得分:0)

在独立的Hive中,SessionState有一些可配置的参数,所有参数均可从hive-site.xml设置:

SCRATCHDIR("hive.exec.scratchdir", "/tmp/hive",
    "HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. " +
    "For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, " +
    "with ${hive.scratch.dir.permission}."),

LOCALSCRATCHDIR("hive.exec.local.scratchdir",
    "${system:java.io.tmpdir}" + File.separator + "${system:user.name}",
    "Local scratch space for Hive jobs"),

DOWNLOADED_RESOURCES_DIR("hive.downloaded.resources.dir",
    "${system:java.io.tmpdir}" + File.separator + "${hive.session.id}_resources",
    "Temporary local directory for added resources in the remote file system."),

HIVEHISTORYFILELOC("hive.querylog.location",
    "${system:java.io.tmpdir}" + File.separator + "${system:user.name}",
    "Location of Hive run time structured log file")

我不熟悉Spark嵌入Hive的确切方式,但我确信有一个hive-site.xml(链接的是测试配置文件),这是控制hive.exec.scratchdir的值, hive.exec.local.scratchdir和其他人。