有谁知道镶木地板存放文件的位置?

时间:2017-06-07 17:26:49

标签: hdfs parquet hue

我将一些记录插入到Hive Parquet表中,一切都完美无缺,但是我想使用hue文件浏览器检查文件,是否有人知道这些文件位于何处?并且在创建表格时是否可以使用属性来更改位置?

这是我的表格定义:

public class Customer {

    private int id;
    private String name;
    private Collection<Order> orders;

    public int getId() {return id;}
    public void setId(int id) {this.id = id;}
    public String getName() {return name;}
    public void setName(String name) {this.name = name;}
    public Collection<Order> getOrders() {return orders;}
    public void setOrders(Collection<Order> orders) {this.orders = orders;}
}

public class Order {
    private int id;
    private String item;

    public int getId() {return id;}
    public void setId(int id) {this.id = id;}
    public String getItem() {return item;}
    public void setItem(String item) {this.item = item;}
}

public class CustomerOrderDto {
    private int customerId;
    private String customerName;
    private String orderId;
    private String orderItem;

    public int getCustomerId() {return customerId;}
    public void setCustomerId(int customerId) {this.customerId = customerId;}
    public String getName() {return name;}
    public void setName(String name) {this.name = name;}
    public String getOrderId() {return orderId;}
    public void setOrderId(String orderId) {this.orderId = orderId;}
    public String getOrderItem() {return orderItem;}
    public void setOrderItem(String orderItem) {this.orderItem = orderItem;}
}

2 个答案:

答案 0 :(得分:0)

您应该可以控制编写文件的目录。我正在将数据从一次蜂巢写入另一个但我将格式从文本更改为镶木地板,我使用以下命令:

val hiveDF = hsc.sql(iSql)
hiveDF.coalesce(noExecutors).write.mode("append").parquet(parquetLoc)`

如果您使用纯文本文件编写,则类似。正如@Samson Scharfrichter所提到的,您可以在创建表时使用LOCATION选项将特定目录作为源,但您需要确保仅将数据写入该目录。

CREATE EXTERNAL TABLE parquet_test_2 (
column1 int,
column2 int
)
STORED AS PARQUET 
LOCATION '{HDFS_DIR}'
TBLPROPERTIES ('PARQUET.COMPRESS'='SNAPPY');

答案 1 :(得分:0)

1。 表的位置位于用于存储表的数据库的位置下 如果您没有明确定义数据库(create my_database.mytable ...)而不是当前活动数据库(默认情况下它是`default`,可以使用use mydatabase;更改) 可以使用desc database my_database;

找到数据库的位置

2。 我强烈建议将Hive视为数据仓库而不是个人计算机上的临时目录 数据应使用数据库进行组织,而不是分别确定每个表的位置。

演示

create database prod;    
desc database prod;
+---------+---------+-------------------------------------------------------------+------------+------------+------------+
| db_name | comment |                          location                           | owner_name | owner_type | parameters |
+---------+---------+-------------------------------------------------------------+------------+------------+------------+
| prod    |         | hdfs://quickstart.cloudera:8020/user/hive/warehouse/prod.db | hive       | USER       |            |
+---------+---------+-------------------------------------------------------------+------------+------------+------------+
use prod;
create table my_prod_table as select 'This is a PROD table';    
show table extended in prod like my_prod_table;
...
location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/prod.db/my_prod_table
...
create database playground location '/tmp/my_hive_playground'; 
desc database playground;
+------------+---------+--------------------------------------------------------+------------+------------+------------+
|  db_name   | comment |                        location                        | owner_name | owner_type | parameters |
+------------+---------+--------------------------------------------------------+------------+------------+------------+
| playground |         | hdfs://quickstart.cloudera:8020/tmp/my_hive_playground | hive       | USER       |            |
+------------+---------+--------------------------------------------------------+------------+------------+------------+
create table playground.my_playground_table as select 'This is a Playground table';    
show table extended in playground like my_playground_table
...
location:hdfs://quickstart.cloudera:8020/tmp/my_hive_playground/my_playground_table
...