我有一个数据源文件,我使用COPY命令在Redshift中加载。
该文件有一堆日期列,年份格式为两位数(我知道,我在这里处理恐龙)。
Redshift识别日期格式,但问题是文件的值如下:
public List<Tuple<String, int>> getHistoricalShipList(int theme)
{
List<Tuple<String, int>> list = new List<Tuple<string, int>>();
using (db)
{
cmd = new SqlCommand(@"Select ShipName, ShipLength from Ships Where HistID=@theme", db);
cmd.Parameters.AddWithValue("@theme", theme);
db.Open();
SqlDataReader reader = cmd.ExecuteReader();
if (reader.HasRows) // always returning false
{
//Loop through results
while (reader.Read())
{
String shipName = reader[0].ToString();
int shipLength = Convert.ToInt32(reader[1]);
list.Add(Tuple.Create(shipName, shipLength));
}
}
db.Close();
}
return list;
}
实际上意味着:
06/01/79
然而Redshift将其解释为:
2079-06-01
有没有办法告诉Redshift我的两位数日期格式的阈值是多少。例如,低于90的值应解释为20XX。
COPY命令中的DATEFORMAT参数没有这样的选项。
答案 0 :(得分:0)
-- Begin transaction
BEGIN TRANS;
-- Create a temp table
CREATE TEMP TABLE my_temp (dtm_str CHAR(8));
-- Load your data into the temp table
COPY my_temp FROM s3://my_bucket … ;
-- Insert your data into the final table
INSERT INTO final_table
-- Grab the first 6 chars and concatenate to the following
SELECT CAST(LEFT(dtm_str,6)||
-- Convert the last 2 chars to and in and compare to your threshold
CASE WHEN CAST(RIGHT(dtm_str,2) AS INT) < 85
-- Add either 1900 or 2000 to the INT, convert to CHAR
THEN CAST(CAST(RIGHT(dtm_str,2) AS INT) + 2000 AS CHAR(4))
ELSE CAST(CAST(RIGHT(dtm_str,2) AS INT) + 1900 AS CHAR(4)) END
-- Convert the final CHAR to a DATE
AS DATE) new_dtm
FROM my_temp;
COMMIT;