将日期时间格式为rfc822复制到redshift

时间:2015-09-24 09:14:52

标签: datetime amazon-web-services amazon-s3 format amazon-redshift

我有以下红移表:

DROP TABLE IF EXISTS "logs";
CREATE TABLE "logs" (
  "source" varchar(255) DEFAULT NULL,
  "method" varchar(255) DEFAULT NULL,
  "path" varchar(1023) DEFAULT NULL,
  "format" varchar(255) DEFAULT NULL,
  "controller" varchar(255) DEFAULT NULL,
  "action" varchar(255) DEFAULT NULL,
  "status" integer DEFAULT NULL,
  "duration" float DEFAULT NULL,
  "view" float DEFAULT NULL,
  "db" float DEFAULT NULL,
  "ip" varchar(255)DEFAULT NULL,
  "route" varchar(255) DEFAULT NULL,
  "request_id" varchar(255) DEFAULT NULL,
  "user" INTEGER DEFAULT  NULL,
  "school" varchar(255) DEFAULT NULL,
  "timestamp" datetime DEFAULT NULL
);

到目前为止一切顺利。

唯一的问题是我在s3上的源文件中的日期时间如下:"2015-01-13T11:13:08.869941+00:00"。这看起来像rfc822(或rfc3339或rfc2822)。

COPY命令支持一些时间格式(请参阅doc:http://docs.aws.amazon.com/redshift/latest/dg/r_DATEFORMAT_and_TIMEFORMAT_strings.html)。但不是我的rfc822格式。

我尝试了以下内容:

TRUNCATE logs;
COPY "logs" FROM 's3://path/to/logstash_logfile.gz'
CREDENTIALS 'aws_access_key_id=THE_KEY;aws_secret_access_key=THE_SECRET'
TIMEFORMAT AS 'MM-DD-YYYYTHH:MI:SS'
JSON 's3://path/to/jsonpath.json' GZIP;

但我得到了:

SELECT * FROM stl_load_errors;

  

无效的时间戳格式或值[MM-DD-YYYYTHH:MI:SS]

2 个答案:

答案 0 :(得分:2)

改为使用TIMEFORMAT 'auto'

可以导入

2015-01-13T11:13:08.869941+00:00

作为

2015-01-13 11:13:08.869941

我认为这种方法只会丢弃时区信息,但至少可以这种方式获取数据。

如果数据中有各种时区,可能需要进行一些预处理,例如将所有内容转换为UTC。

不幸的是,我认为提供时间格式的COPY相当严格,不支持时区部分。

答案 1 :(得分:2)

我们遇到了完全相同的问题并找到了解决方法:

CREATE TABLE final_table ("ts_as_timestamptz" TIMESTAMPTZ);
CREATE TEMP TABLE helper_table ("ts_as_varchar" VARCHAR(64));

COPY "helper_table" FROM 's3://path/to/file.csv.gz'
CREDENTIALS 'aws_access_key_id=THE_KEY;aws_secret_access_key=THE_SECRET'
CSV
GZIP;

INSERT INTO final_table (ts_as_timestamptz)
SELECT ts_as_varchar::TIMESTAMPTZ FROM helper_table;

或者,或者:

CREATE TABLE final_table ("ts_as_timestamp" TIMESTAMP);
CREATE TEMP TABLE helper_table ("ts_as_varchar" VARCHAR(64));

COPY "helper_table" FROM 's3://path/to/file.csv.gz'
CREDENTIALS 'aws_access_key_id=THE_KEY;aws_secret_access_key=THE_SECRET'
CSV
GZIP;

INSERT INTO final_table (ts_as_timestamp)
SELECT ts_as_varchar::TIMESTAMPTZ FROM helper_table;

您可以快速测试:

DROP TABLE IF EXISTS helper_table;
CREATE TEMP TABLE helper_table ("ts_as_varchar" VARCHAR(64));
INSERT INTO helper_table (ts_as_varchar) VALUES 
    ('2015-01-13T11:13:08.869941+00:00'),
    ('2015-01-13T12:13:08.869941+01:00'),
    ('2015-01-13T13:13:08.869+02:00'), 
    ('2015-01-13T14:13:08+03:00'),
    ('2015-01-13T11:13:08'),
    ('2015-01-13 11:13:08.869941+00:00'),
    ('2015-01-13 12:13:08.869941+01:00'),
    ('2015-01-13 13:13:08.869+02:00'), 
    ('2015-01-13 14:13:08+03:00'),
    ('2015-01-13 11:13:08')
;

DROP TABLE IF EXISTS final_table;
CREATE TEMP TABLE final_table (
    "ts_as_varchar" VARCHAR(64),
    "ts_as_timestamptz" TIMESTAMPTZ,
    "ts_as_timestamp" TIMESTAMP
    );
INSERT INTO final_table (ts_as_varchar, ts_as_timestamptz, ts_as_timestamp)
SELECT ts_as_varchar, ts_as_varchar::TIMESTAMPTZ, ts_as_varchar::TIMESTAMPTZ
FROM helper_table;

-- The following depends on the time zone of your SQL client, so the results may vary. It is also vulnerable to the SQL client removing the sub-second parts.
-- SELECT * FROM final_table;
-- The following may (?) work better even if your SQL client is not in UTC
SELECT ts_as_varchar, ts_as_varchar::varchar, ts_as_varchar::varchar FROM final_table;

给出了这些结果:

ts_as_varchar                       ts_as_timestamptz                   ts_as_timestamp
2015-01-13 12:13:08.869941+01:00    2015-01-13 12:13:08.869941+01:00    2015-01-13 12:13:08.869941+01:00
2015-01-13T13:13:08.869+02:00       2015-01-13T13:13:08.869+02:00       2015-01-13T13:13:08.869+02:00
2015-01-13T11:13:08.869941+00:00    2015-01-13T11:13:08.869941+00:00    2015-01-13T11:13:08.869941+00:00
2015-01-13 11:13:08.869941+00:00    2015-01-13 11:13:08.869941+00:00    2015-01-13 11:13:08.869941+00:00
2015-01-13 13:13:08.869+02:00       2015-01-13 13:13:08.869+02:00       2015-01-13 13:13:08.869+02:00
2015-01-13T14:13:08+03:00           2015-01-13T14:13:08+03:00           2015-01-13T14:13:08+03:00
2015-01-13 11:13:08                 2015-01-13 11:13:08                 2015-01-13 11:13:08
2015-01-13T12:13:08.869941+01:00    2015-01-13T12:13:08.869941+01:00    2015-01-13T12:13:08.869941+01:00
2015-01-13 14:13:08+03:00           2015-01-13 14:13:08+03:00           2015-01-13 14:13:08+03:00
2015-01-13T11:13:08                 2015-01-13T11:13:08                 2015-01-13T11:13:08

使用Redshift 1.0.2610测试 请注意,您的SQL客户端或驱动程序可能会进行一些可能会产生误导的时区转换,因此最好使用UTC作为计算机/驱动程序/ SQL客户端的时区进行测试。 此外,某些SQL客户端会删除时间戳的亚秒级部分。