SQOOP与HIVE集成:将数据加载到分区的HIVE表中

时间:2018-12-12 02:44:11

标签: hadoop hive sqoop

我正在使用本地计算机上的SQOOP将数据从mysql导入到HIVE,以下是用例

  1. Customer2表在mysql中的存在方式如下

      mysql> select * from customer2;
      +-------------+----------------+----------------+----------------+---------    ----------+-----------------+---------------+----------------+------------------+
     | customer_id | customer_fname | customer_lname | customer_email | customer_password | customer_street | customer_city | customer_state | customer_zipcode |
     +-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
     |      100008 | Christine      | K              | NULL           | NULL                  | HK              | HK            | HK             | 19293            |
     |      100009 | Chris          | Taylor         | NULL           | NULL              | HK              | HK            | HK             | 1925             |
     |      100010 | Mark           | Jamiel         | NULL           | NULL              | HK              | HK            | HK             | 19294            |
     |      100011 | Tom            | Pride          | NULL           | NULL              | HK              | HK            | HK             | 19295            |
     |      100012 | Tom            | Heather        | NULL           | NULL              | CA              | CA            | CA             | 19295            |
     |      100013 | Maxim          | Calay          | NULL           | NULL              | CA              | CA            | CA             | 19295            |
     +-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
    
  2. 相同的表customer2存在于按customer_city划分的HIVE中,如下所示:

    hive (sumitpawar)> describe customer2;
    OK
     col_name        data_type       comment
     customer_id           int                                         
     customer_fname          string                                      
     customer_lname          string                                      
     customer_email          string                                      
     customer_password       string                                      
     customer_zipcode        string                                      
     customer_city           string                                      
    
     Partition Information          
     col_name              data_type               comment             
    
     customer_city           string                                      
     Time taken: 0.098 seconds, Fetched: 12 row(s)
     hive (sumitpawar)> 
    
  3. 然后我正在使用下面的SQOOP命令将数据导入到HIVE,并期望行将移动到适当的分区

Sqoop

sqoop import \
--options-file ./options_file.txt \
--table customer2 \
--columns  'customer_id,customer_fname,customer_lname,customer_email,customer_password,customer_zipcode' \
--hive-database sumitpawar \
--hive-import \
--null-string 'Empty' \
--null-non-string 0 \
-m 2 \
--mapreduce-job-name "JOB :import to hive from mysql"  \
--warehouse-dir "/PRACTICALS/SQOP/retail_db/increment_hive_mysql4" \
--hive-partition-key customer_city \
--hive-partition-value 'CA'

options_file的内容:option_file.txt

 [cloudera@quickstart SQOOP]$ cat options_file.txt 
  #############################################
 --connect
 jdbc:mysql://localhost:3306/retail_db

 --username
 root

 --password-file
 /PRACTICALS/SQOOP/password
 ############################################

SQOOP命令的执行日志

    18/12/09 06:59:38 INFO mapreduce.Job: Running job: job_1542647782962_0231
    18/12/09 06:59:49 INFO mapreduce.Job: Job job_1542647782962_0231 running in uber mode : false
    18/12/09 06:59:49 INFO mapreduce.Job:  map 0% reduce 0%
    18/12/09 07:00:06 INFO mapreduce.Job:  map 50% reduce 0%
    18/12/09 07:00:07 INFO mapreduce.Job:  map 100% reduce 0%
    18/12/09 07:00:07 INFO mapreduce.Job: Job job_1542647782962_0231 completed successfully
    18/12/09 07:00:07 INFO mapreduce.Job: Counters: 30
                File System Counters
                        FILE: Number of bytes read=0
                        FILE: Number of bytes written=311480
                        FILE: Number of read operations=0
                        FILE: Number of large read operations=0
                        FILE: Number of write operations=0
                        HDFS: Number of bytes read=253
                        HDFS: Number of bytes written=220
                        HDFS: Number of read operations=8
                        HDFS: Number of large read operations=0
                        HDFS: Number of write operations=4
                  Job Counters 
                        Launched map tasks=2
                        Other local map tasks=2
                        Total time spent by all maps in occupied slots (ms)=29025
                        Total time spent by all reduces in occupied slots (ms)=0
                        Total time spent by all map tasks (ms)=29025
                        Total vcore-milliseconds taken by all map tasks=29025
                        Total megabyte-milliseconds taken by all map tasks=29721600
             Map-Reduce Framework
                        Map input records=6
                        Map output records=6
                        Input split bytes=253
                        Spilled Records=0
                        Failed Shuffles=0
                        Merged Map outputs=0
                        GC time elapsed (ms)=417
                        CPU time spent (ms)=2390
                        Physical memory (bytes) snapshot=276865024
                        Virtual memory (bytes) snapshot=3020857344
                        Total committed heap usage (bytes)=121765888
                  File Input Format Counters 
                        Bytes Read=0
                  File Output Format Counters 
                        Bytes Written=220
           18/12/09 07:00:07 INFO mapreduce.ImportJobBase: Transferred 220 bytes in 33.0333 seconds (6.66 bytes/sec)
           18/12/09 07:00:07 INFO mapreduce.ImportJobBase: Retrieved 6 records.
           18/12/09 07:00:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customer2` AS t LIMIT 1
           18/12/09 07:00:07 INFO hive.HiveImport: Loading uploaded data into Hive

           Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.12.0.jar!/hive-log4j.properties
           OK
           Time taken: 3.016 seconds
           Loading data to table sumitpawar.customer2 partition (customer_city=CA)
           Partition sumitpawar.customer2{customer_city=CA} stats: [numFiles=2, numRows=0, totalSize=440, rawDataSize=0] 
           OK
           Time taken: 1.056 seconds

------------------------------------------------------------------------------------------------

但是,在执行查询HIVE中的表后,我可以看到除customer_city以外的所有列的NULL值

HIVE表的输出

hive (sumitpawar)> select * from customer2;
OK
+-------------+----------------+----------------+----------------+-----------   --------+-----------------+---------------+----------------+------------------+
| customer_id | customer_fname | customer_lname | customer_email |  customer_password | customer_street | customer_city | customer_state | customer_zipcode |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
|      NULL | NULL      | NULL              | NULL           | NULL              |NULL              | NULL            | NULL             | NULL           |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+

有人可以让我知道上面是否有什么问题,以及如何从带有分区的HIVE表的SQOP中加载数据吗?

关于, 提交

0 个答案:

没有答案