如何在Apache Pig中使用ToDate函数处理NULL值

时间:2015-11-19 13:25:30

标签: hadoop apache-pig

我的输入中有日期时间数据,想要从Pig正确加载它。我用谷歌搜索并了解它建议加载为chararray然后使用ToDate函数转换为datetime。但是,有时我的datetime字段为NULL。然后,当我尝试应用ToDate函数时,我从PIG获得NULL指针异常。我正在尝试使用二进制运算符,但我收到以下错误:

  输入不匹配'?'期待SEMI_COLON

这没有意义。

=============================================== ======================

这是我到目前为止的代码:

transactions_edited = FOREACH transactions GENERATE  
  id,
  code,
  user_id,
  visit_code,
  channel_id,
  transaction_type,
  product_category,
  product_subcategory,
  specific_id,
  specific_type,
  email,
  cpf,
  name,
  last_name,
  gender,
  birth_date,
  phone_code,
  phone,
  additional_phone_code,
  additional_phone,
  zip_code,
  monthly_income,
  status,
  opportunity_status,
  ToDate(created_at,'yyyy-MM-dd HH:mm:ss') AS created_at,
  ToDate(updated_at,'yyyy-MM-dd HH:mm:ss') AS updated_at,
  old_status,
  old_masked_id,
  address_type,
  address,
  address_number,
  address_complement,
  neighborhood,
  city,
  state,
  landing_path,
  referrer,
  source,
  source_advertising,
  keyword,
  ad_id,
  ad_name,
  ad_network,
  ad_placement,
  ad_device,
  cpf_restriction,
  mother_name,
  registration_form_closed,
  (opportunity_status_updated_at is not NULL ) ?ToDate(opportunity_status_updated_at,'yyyy-MM-dd HH:mm:ss') : AS opportunity_status_updated_at,
  potential,
  interest,
  lead_id,
  (integrated_at is not NULL) ? ToDate(integrated_at,'yyyy-MM-dd HH:mm:ss') : AS integrated_at,
  starred,
  channel_input_type,
  rg 
  ;

任何帮助都将非常感激。

谢谢!

1 个答案:

答案 0 :(得分:0)

三元运算符有一点修正。请使用以下修改后的代码 -

transactions_edited = FOREACH transactions GENERATE  
  id,
  code,
  user_id,
  visit_code,
  channel_id,
  transaction_type,
  product_category,
  product_subcategory,
  specific_id,
  specific_type,
  email,
  cpf,
  name,
  last_name,
  gender,
  birth_date,
  phone_code,
  phone,
  additional_phone_code,
  additional_phone,
  zip_code,
  monthly_income,
  status,
  opportunity_status,
  ToDate(created_at,'yyyy-MM-dd HH:mm:ss') AS created_at,
  ToDate(updated_at,'yyyy-MM-dd HH:mm:ss') AS updated_at,
  old_status,
  old_masked_id,
  address_type,
  address,
  address_number,
  address_complement,
  neighborhood,
  city,
  state,
  landing_path,
  referrer,
  source,
  source_advertising,
  keyword,
  ad_id,
  ad_name,
  ad_network,
  ad_placement,
  ad_device,
  cpf_restriction,
  mother_name,
  registration_form_closed,
  (opportunity_status_updated_at is not NULL ? ToDate(opportunity_status_updated_at,'yyyy-MM-dd HH:mm:ss') : opportunity_status_updated_at) AS opportunity_status_updated_at,
  potential,
  interest,
  lead_id,
  (integrated_at is not NULL ? ToDate(integrated_at,'yyyy-MM-dd HH:mm:ss') : integrated_at) AS integrated_at,
  starred,
  channel_input_type,
  rg 
  ;