我正在尝试从包含超过一百万行的CSV提取(从Oracle数据库表生成)创建一个Parquet表。这些行中约有25行的START_DATE值为空,CTAS无法将""
解释为null
。任何建议将不胜感激。
CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;
Error: SYSTEM ERROR: IllegalArgumentException: Invalid format ""
答案 0 :(得分:0)
您始终可以包含CASE
语句来过滤掉空条目:
CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
CASE WHEN columns[3] = '' THEN null
ELSE to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a')
END as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;
答案 1 :(得分:0)
你也可以使用NULLIF()函数,如下所示
CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(NULLIF(columns[3],''), 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;
NULLIF会将空字符串转换为null,并且转换不会失败。