Question

我有一个文件：file.txt，其中包含数据：

24/9/2018 15:35:19.380 B63201C<br>
24/9/2018 15:35:22.350 ES0101C(hour_start)<br>
24/9/2018 15:36:13.231 Execute service next : 0003<br>
24/9/2018 15:38:13.664 Result of the execution 0003 Result: 0003<br>
24/9/2018 15:39:10.664 Executing the transaction PE20<br>
24/9/2018 15:35:26.773 ES0101C(hour_end)<br>
24/9/2018 15:36:12.164 B63201C<br>
- 1 bloque -<br>
24/9/2018 17:16:17.428 B63201C<br>
24/9/2018 17:16:29.031 ES0101C(hour_start)<br>
24/9/2018 17:16:13.231 Execute service next : 0003<br>
24/9/2018 17:18:13.664 Result of the execution 0003 Result: 0003<br>
24/9/2018 17:19:10.664 Executing the transaction BE15<br>
24/9/2018 17:25:26.773 ES0101C(hour_end)<br>
24/9/2018 17:26:12.164 B63201C<br>
- 2 bloque -<br>

我需要使用以下字段提取CSV格式的数据：

日期，小时开始，小时结束，B63201C-ES0101C，事务

换句话说，捕获的数据将是：

> 24/9/2018,15:35:22.350,15:35:26.773,B63201C-ES0101C,PE20
> 24/9/2018,17:16:29.031,17:25:26.773,B63201C-ES0101C,BE15

有什么方法可以在Bash或AwK中实现？

Answer 1

#!awk -f
/Executing the transaction / {
  sub("<br>","")
  transaction=$NF
}
/\(hour_start\)/ {
  date=$1
  hour_start=$2
  id1=prev
}
/\(hour_end\)/ {
  hour_end=$2
  split($3,a,"(")
  id2=a[1]
  printf "%s,%s,%s,%s-%s,%s\n", date, hour_start, hour_end, id1, id2, transaction
}
{
  sub("<br>","")
  prev=$3
}

示例代码在名为script的可执行文件中，示例输入input：

$ ./script input
24/9/2018,15:35:22.350,15:35:26.773,B63201C-ES0101C,PE20
24/9/2018,17:16:29.031,17:25:26.773,B63201C-ES0101C,BE15

以csv格式提取数据

1 个答案: