寻找解析awk记录的解决方案,其中,也可以是/n
个字符。记录以|
分隔。问题是,当达到一定数量的字段时,可以确定新行。如何在awk中完成这项工作?
示例:
2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select *
from bb
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|
是一条记录,其中包含许多\n
个字符。我需要用awk解析并获得例如第5个字段。
答案 0 :(得分:3)
从sudo_O上面的答案中汲取灵感......
将变量FIELD_TO_PRINT设置为感兴趣的字段位置,将另一个变量FIELDS_PER_RECORD设置为表示记录的字段数。在Ubuntu上使用GNU awk
进行测试
awk -v FIELDS_PER_RECORD=10 -v FIELD_TO_PRINT=5 'BEGIN{FS="|"; RS="\0"}\
{for (i=1; i<=NF; ++i) {if (i % FIELDS_PER_RECORD == FIELD_TO_PRINT) {print $i} }}' file_name.txt
th2056569632
x10354453
SET DATESTYLE = "ISO"; Select * from bb where cc='1'
答案 1 :(得分:1)
对于文件中的一条记录,您无法将记录分隔符设置为空字符RS='\0'
,因此输入文件将作为一个完整记录读取:
$ awk '{print $5}' FS='|' RS='\0' file
th2056569632
对于多个记录,您可以使用date
作为记录分隔符(除非它们已经用空白行分隔,这会使事情更简单或除非您在输出中需要此字段):
$ awk 'NR>1{print $5}' FS='|' RS='(^|[^|])[0-9]{4}-[0-9]{2}-[0-9]{2} ' file
th2056569632
th1093698336
更简单grep -o 'th[0-9]*' file
更适合吗?
答案 2 :(得分:1)
显然,不是你要求的:为了比较,这是我在python中如何做到这一点:
from cStringIO import StringIO
def records_from_file(f,separator='|',field_count=30):
record = []
for line in f:
fields = line.split(separator)
if len(record) > 0:
# Merge last of existing with first of new
record[-1] += fields[0]
# Extend rest of fields
record.extend(fields[1:])
else:
record.extend(fields)
if len(record) > field_count:
raise Exception("Concatenating records overflowed number of fields",record)
elif len(record) == field_count:
yield record
record = []
sample = """2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select *
from bb
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|"""
for record in records_from_file(StringIO(sample)):
print record[4]
产量:
th2056569632
th1093698336