使用PostgreSQL解析apache日志

时间:2010-10-17 11:33:53

标签: apache parsing postgresql logging

This O'Reilly article给出了一个解析Apache日志行的PostgreSQL语句示例:

 INSERT INTO http_log(log_date,ip_addr,record)
     SELECT CAST(substr(record,strpos(record,'[')+1,20) AS date),
            CAST(substr(record,0,strpos(record,' ')) AS cidr),
            record
 FROM tmp_apache;

显然,这只会提取IP和时间戳字段。是否存在从典型的组合日志格式记录中提取所有字段的规范声明?如果没有,我会写一个,我保证在这里发布结果!

3 个答案:

答案 0 :(得分:3)

好的,这是我的解决方案:

insert into accesslog
select m[1], m[2], m[3],
    (to_char(to_timestamp(m[4], 'DD/Mon/YYYY:HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS ')
        || split_part(m[4], ' ',2))::timestamp with time zone,
     m[5], m[6]::smallint, (case m[7] when '-' then '0' else m[7] end)::integer, m[8], m[9] from (
    select regexp_matches(record,
 E'(.*) (.*) (.*) \\[(.*)\\] "(.*)" (\\d+) (.*) "(.*)" "(.*)"')
 as m from tmp_apache) s;

它从表tmp_apache中获取原始日志行,并将字段(使用regexp)提取到数组中。

答案 1 :(得分:0)

这是我更完整的解决方案。

apache日志文件不应包含无效字符或反斜杠。如有必要,可以使用以下命令将它们从日志文件中删除:

cat logfile | strings | grep -v '\\' > cleanedlogfile

然后将日志文件复制并解析为postgres(m [1]至m [7]对应于regexp_matches函数中的regex组):

-- sql for postgres:
drop table if exists rawlog;
create table rawlog (record varchar);
-- import data from log file
copy rawlog from '/path/to/your/apache/cleaned/log/file';
-- parse the rawlog into table accesslog
drop table if exists accesslog;
create table accesslog as
(select m[1] as clientip,
  (to_char(to_timestamp(m[4], 'DD/Mon/YYYY:HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS ')
        || split_part(m[4], ' ',2))::timestamp with time zone as "time",
  split_part(m[5], ' ', 1) as method,
  split_part(split_part(m[5], ' ', 2), '?', 1) as uri,
  split_part(split_part(m[5], ' ', 2), '?', 2) as query,
  m[6]::smallint as status,
  m[7]::bigint bytes
    from
(select 
  regexp_matches(record, E'(.*) (.*) (.*) \\[(.*)\\] "(.*)" (\\d+) (\\d+)') as m 
   from rawlog) s);
-- optionally create indexes
create index accesslogclientipidx on accesslog(clientip);
create index accesslogtimeidx on accesslog(time);
create index accessloguriidx on accesslog(uri);

答案 2 :(得分:-1)

有关登录Postgres的信息,请参阅this blog post