Question

寻找解析awk记录的解决方案，其中，也可以是/n个字符。记录以|分隔。问题是，当达到一定数量的字段时，可以确定新行。如何在awk中完成这项工作？

示例：

2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|

是一条记录，其中包含许多\n个字符。我需要用awk解析并获得例如第5个字段。

Answer 1

从sudo_O上面的答案中汲取灵感...... 将变量FIELD_TO_PRINT设置为感兴趣的字段位置，将另一个变量FIELDS_PER_RECORD设置为表示记录的字段数。在Ubuntu上使用GNU awk进行测试

awk   -v FIELDS_PER_RECORD=10 -v FIELD_TO_PRINT=5 'BEGIN{FS="|"; RS="\0"}\
{for (i=1; i<=NF; ++i) {if (i % FIELDS_PER_RECORD == FIELD_TO_PRINT) {print $i} }}' file_name.txt
th2056569632
x10354453
SET DATESTYLE = "ISO"; Select * from bb where cc='1'

Answer 2

对于文件中的一条记录，您无法将记录分隔符设置为空字符RS='\0'，因此输入文件将作为一个完整记录读取：

$ awk '{print $5}' FS='|' RS='\0' file
th2056569632

对于多个记录，您可以使用date作为记录分隔符（除非它们已经用空白行分隔，这会使事情更简单或除非您在输出中需要此字段）：

$ awk 'NR>1{print $5}' FS='|' RS='(^|[^|])[0-9]{4}-[0-9]{2}-[0-9]{2} ' file
th2056569632
th1093698336

更简单grep -o 'th[0-9]*' file更适合吗？

Answer 3

显然，不是你要求的：为了比较，这是我在python中如何做到这一点：

from cStringIO import StringIO

def records_from_file(f,separator='|',field_count=30):
  record = []
  for line in f:
    fields = line.split(separator)
    if len(record) > 0:
      # Merge last of existing with first of new
      record[-1] += fields[0]
      # Extend rest of fields
      record.extend(fields[1:])
    else:
      record.extend(fields)
    if len(record) > field_count:
      raise Exception("Concatenating records overflowed number of fields",record)
    elif len(record) == field_count:
      yield record
      record = []

sample = """2013-03-24 15:49:40.575175 EST|aaa|tsi|p1753|th2056569632|172.30.10.212|56809|2013-03-24 15:49:32 AFT|10354453|con2326|cmd7|seg-1||dx318412|x10354453|sx1|LOG: |00000|statement: SET DATESTYLE = "ISO"; Select * 
from bb 
where cc='1'||||||SET DATESTYLE = "ISO"; Select * from bb where cc='1'|0||postgres.c|1447|
2013-04-10 12:45:48.277080 EST|aa|tsi|p22814|th1093698336|172.30.0.186|3304|2013-04-10 12:44:29 AFT|10400046|con67|cmd5|seg-1||dx341|x10400046|sx1|LOG: |00000|statement: create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)||||||create table xx as (select r.xx,sum(r."XX"),c.dd from region_RR r, cat_CC c
where r.aa=c.vv
group by 1)
|0||postgres.c|1447|"""

for record in records_from_file(StringIO(sample)):
  print record[4]

产量：

th2056569632
th1093698336

用固定数量的字段解析awk

3 个答案: