Question

我有一个使用以下代码创建的hadoop表：

create table XXXX 
(...some data definitions...)
row format delimited
WITH SERDEPROPERTIES ('field.delim' = '^')
(...some other properties...)

然后我转移到HDFS，找到数据库下的相应表格，并将csv文件上传到其上。我的csv文件列遵循create table语句定义的顺序，其中分区列放在末尾。成功上传csv文件后，结果是我'select * from mydataset'查询没有产生任何结果。当我点击csv文件时，它似乎不对，分隔符＆＃39; ^＆＃39;并且数据字段仍在那里。

我想知道问题所在，如果我，结果会有所不同使用{行格式分隔的字段以＆＃39; ^＆＃39;而不是？
我所做的就是上传csv文件与使用加载相同数据inpath声明？我是否可以在路径语句中使用加载数据它会比手动上传csv文件更快吗？

谢谢。

Answer 1

There are Two ways to Upload the data into Hive table
1) Load Command
2) Follow the below steps.
    Step 1: Create folder on HDFS (Example: hadoop fs -mkidr /user/Username/orders)
    Step 2: Upload the Files to the above folder(Example: hadoop fs -put csvfiles /user/Username/orders/)
    Step 3: Create the External Hive table using the above folder. After this operation you can query and test the data
            Example:
              Create External Table ordersfeed(
                order_id BIGINT,
                order_name String
              )
              ROW FORMAT DELIMITED
                FIELDS TERMINATED BY ','
              LOCATION '/user/Username/orders'
              STORED AS TEXTFILE;
    Step 4: Create Internal Hive table
          Create  Table ordersdata(
            order_id BIGINT,
            order_name String
          )
          STORED AS ORC
    Step 5: Insert the Data from External table to internal table
          Example:
            INSERT INTO TABLE ordersdata
            SELECT * FROM ordersfeed;
Note:
  1) Both delimiter of CSV file and External table should be same

Answer 2

更新我的问题：

我发现我的表是一个分区表，只需将csv文件上传到表文件夹中就无法将数据加载到该表中。应该使用静态分区/动态分区插入覆盖表。

将数据集作为csv文件上传到配置单元

2 个答案: