HIVE - 如何处理serde中的多个quoteChar

时间:2018-04-17 06:58:06

标签: hadoop hive bigdata

我有源文件CSV,数据如下所示

  

" 201814"," 39"," 0598824"," YELLOW JACKET TRAP W"," PIEGE   GUEP.JAU,OUEST"" ACT"&#34 7 /二千零十六分之二十零"" C / E
  &#34;&#34; 05&#34;&#34; ST&#34;&#34; N&#34;&#34; 15&#34;&#34; 2484&#34 ;, #&34; 985.39999999999998&#34;&#34; 43.66&#34;&#34; 3762.36&#34;&#34; 53.05&#34;&#34;   &#34;&#34; N&#34;&#34;&#34;&#34; 5.83&#34;&#34; 7.9900&#34;&#34; 0.0000&# 34;,&#34; 0.0000&#34;&#34; 3.82&#34;&#34; 3.8181&#34;&#34; 7162&#34;&#34; STERLING   INTN&#39; L&#34;&#34; d&#34;&#34; 12&#34;&#34; YJTD-DB12-W - &#34;&#34; 12&#34; &#34; 32&#34;&#34; 0&#34;&#34; 0&#34;&#34; 0&#34;,的&#34; \&#34; < /strong>,"3.68","0","","   #&34;,&#34;&#34;&#34;&#34;&#34;&#34;&#34; &#34;&#34;&#34;

使用下面的serde

创建语句来加载数据
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "|",
   "quoteChar"     = '\"',
   "escapeChar"    = '\\') 

问题发生在&#34; \&#34;文件中存在的数据是否为NULL

你能告诉我怎么办吗?

我使用完整的DDL

CREATE EXTERNAL TABLE
    excess_inventory
    (
        whole_record string,
        yyyyww string,
        excess_wks_num string,
        product_num string,
        eng_desc string,
        fr_desc string,
        status string,
        corp_status_change_date string,
        whse_region string,
        whse_id string,
        channel_cd string,
        eap_ind string,
        fwos string,
        non_alloc_qty string,
        excess_qty string,
        excess_cube string,
        excess_inventory_dollars string,
        monthly_storage_cost string,
        deal_600 string,
        go_ind string,
        next_5_deals string,
        reg_adlr string,
        reg_retail string,
        r52_best_promo_adlr string,
        r52_best_promo_retail string,
        landed_cost string,
        corp_cost string,
        vendor_num string,
        vendor_nm string,
        vendor_origin string,
        vendor_moq string,
        vendor_part_num string,
        vendor_lead_tm string,
        total_lead_tm string,
        ingate_qty string,
        on_order_qty string,
        dealer_restriction_cd string,
        quote_cost string,
        casting_charge string,
        action_cd string,
        action_yyyyww string,
        action_qty string,
        sugg_adlr string,
        comments string,
        create_yyyyww string,
        user_nm string,
    batch_ts timestamp
) 
PARTITIONED BY (partition_batch_ts bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "|",
   "quoteChar"     = '\"',
   "escapeChar"    = '\\') 
 STORED AS TEXTFILE
LOCATION
'db/excess_inventory/table'
TBLPROPERTIES('skip.header.line.count'='1','serialization.null.format'='');

也让我知道&#34; separatorChar&#34; =&#34; |&#34;,用于说数据要作为管道分隔符保存在HDFS中,还是我们必须在源文件中指定分隔符?

0 个答案:

没有答案