我正在尝试将文本日志格式化为csv文件 文本日志文件格式。每个以前缀(“ t =%m p =%p h =%h db =%d u =%u x =%x”)开头的条目都将延续到下一个前缀行。它可能包含\ n和\ r转义序列。
t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.011 ms execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)
t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 DETAIL: parameters: $1 = '9187372'
t=2020-08-25 15:00:00.001 +03 p=36001 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.005 ms bind S_1: COMMIT
t=2020-08-25 15:00:00.001 +03 p=36001 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.004 ms execute S_1: COMMIT
t=2020-08-25 15:00:00.001 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.018 ms bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)
t=2019-12-19 17:00:00.102 +03 p=58042 h= db= u= x=0 LOG: automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0
pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983
buffer usage: 90 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
在前缀SQL语句之后,它们通常是不固定的。
如果没有前缀将是完美的,则每一行的格式应如下:
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","LOG:"," duration: 0.011 ms execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)"
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","DETAIL:"," parameters: $1 = '9187372'"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG:"," duration: 0.005 ms bind S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG:"," duration: 0.004 ms execute S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","16205","127.0.0.1","test","test_app","0","LOG:"," duration: 0.018 ms bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)"
"2019-12-19 17:00:00.102 +03","58042","","","","0","LOG:"," automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983 buffer usage: 90 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
regex101: https://regex101.com/r/R3vADD/4
但是我不确定将csv文件复制到db时,预期行的最后一部分会出现一些问题,因为“表”具有双引号。
" automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983 buffer usage: 90 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
谢谢。
答案 0 :(得分:3)
对于FPAT
使用GNU awk,{r {1}}和match()
的3rg arg为\s/\S
和[[:space:]]
的简写:
[^[:space:]]
。
$ cat tst.awk
BEGIN {
FPAT = "[[:alnum:]]+=[^=]* "
OFS = ","
}
/^\S/ { if (NR>1) prt() }
{ prev = prev $0 }
END { prt() }
function prt( orig, i, a) {
orig = $0
$0 = prev
match($0,/(.* )(LOG|DETAIL): +(.*)/,a)
$0 = a[1]
$(NF+1) = a[2]
$(NF+1) = a[3]
for (i=1; i<=NF; i++) {
gsub(/^\s+|\s+$/,"",$i)
sub(/^\S+=/,"",$i)
gsub(/"/,"\"\"",$i)
printf "\"%s\"%s", $i, (i<NF ? OFS : ORS)
}
$0 = orig
prev = ""
}
问题中预期输出的最后一行包含$ awk -f tst.awk file
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","LOG","duration: 0.011 ms execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)"
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","DETAIL","parameters: $1 = '9187372'"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG","duration: 0.005 ms bind S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG","duration: 0.004 ms execute S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","16205","127.0.0.1","test","test_app","0","LOG","duration: 0.018 ms bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)"
"2019-12-19 17:00:00.102 +03","58042","","","","0","LOG","automatic vacuum of table ""postgres.pgagent.pga_job"": index scans: 0 pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983 buffer usage: 90 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
,但这不是有效的CSV,因为您不能在双引号引起来的字符串中使用未转义的双引号。它必须是" automatic vacuum of table "postgres.pgagent.pga_job": index ..."
或" automatic vacuum of table ""postgres.pgagent.pga_job"": index ..."
(取决于您要阅读的任何工具采用哪种转义构造,无论使用哪种“标准”(请参见What's the most robust way to efficiently parse CSV using awk?)与)为有效CSV。我决定在上面的脚本中针对这种情况使用" automatic vacuum of table \"postgres.pgagent.pga_job\": index ..."
,因为这是MS-Excel所期望的,但是如果需要的话,使用""
将是微不足道的调整-只需更改{{ 1}}到\"
。
答案 1 :(得分:1)
您在这里:https://regex101.com/r/R3vADD/1
^t=(.* .*) p=(\d+)? h=(.*)? db=(\w+)? u=(\w+)? x=(\d+)? (\w+:) (.*)
将匹配组,您可以这样替换它们:
"\1","\2","\3","\4","\5","\6","\7","\8"
带有Perl的CLI中的示例:
cat file.csv|perl -pe 's/^t=(.* .*) p=(\d+) h=(.*) db=(\w+) u=(\w+) x=(\d+) (\w+:) (.*)/"\1","\2","\3","\4","\5","\6","\7","\8"/g'