AWK - 如何找到独特的状态并打印出第一次和最后一次出现

时间:2017-08-25 08:58:23

标签: awk grep

我有一个大型文本文件,其中包含多行与一个项目相关的数据,一个项目最多可以包含15个不同的行,但所有这些行都通过名为" itemId"的字段链接。即itemId =< 12560317>每行在开头都有一个时间戳,即170209 035711 0792。

170209 035711 0638 DE(N) ItemHandler.ItemLog event=<DESTINATION_REPLY>, *********************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, ccReason=<SCANNER_DATA_ADDED>, 
170209 035711 0638 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM>, *************************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, PendingchuteGroup=<[3000]: Parked0>, Pendingstrategy=<notSpecified>, CscdestinationId=<-1: UnDef>, CmcdestinationId=<4099: All Scanners>, position=<sorter#0.scanner#4000: SCAN01>, itemRevisionNumber=<7> ##[
170209 035711 0715 DE(N) ItemHandler.ItemLog event=<SCANNER_RESULT>, ************************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodeCount=<4>
170209 035711 0715 DE(N) ItemHandler.ItemLog event=<DESTINATION_REQUEST>, *******************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600001372909310>,
170209 035711 0717 DE(N) ItemHandler.ItemLog event=<DISCHARGE_ATTEMPTED>, *******************, itemId=<12560209>, globalId=<12560209>, cmcIndex=<653>, sorter=<0: MS01>, state=<CSC: ProjectHeadingForChute>, CscdestinationId=<19: CHU208>, chuteGroup=<[17, 19, 21]: [CHU207, CHU208, CHU209]>, CmcdestinationId=<19: CHU208>, position=<sorter#0: MS01>, itemRevisionNumber=<16> ##[
170209 035711 0719 DE(N) ItemHandler.ItemLog event=<DESTINATION_REPLY>, *********************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, ccReason=<SCANNER_DATA_ADDED>, PendingccResult=<OK>, Pendingstrategy=<notSpecified>,
170209 035711 0719 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM>, *************************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, PendingchuteGroup=<[3000]: Parked0>, Pendingstrategy=<notSpecified>, CscdestinationId=<-1: UnDef>, CmcdestinationId=<-1: UnDef>, position=<sorter#0.scanner#4001: IU04-SCAN02>, itemRevisionNumber=<4> ##[
170209 035711 0792 DE(N) ItemHandler.ItemLog event=<ITEM_AT_INDUCTION>, *********************, itemId=<12560317>, globalId=<12560317>, cmcIndex=<761>, sorter=<0: MS01>, state=<CSC: ProjectIdle>, inductionId=<3: IU04>, position=<sorter#0.induction#3: IU04>, itemRevisionNumber=<0> ##[
170209 035711 0792 DE(N) ItemHandler.ItemLog event=<SET_ITEM_ID>, ***************************, itemId=<12560317>, globalId=<12560317>, cmcIndex=<761>, sorter=<0: MS01>, state=<CSC: ProjectIdle>, itemRevisionNumber=<0> ##[
170209 035711 0794 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM_REPLY>, *******************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4000: SCAN01>, chuteListStartPoint=<-1>, itemRevisionNumber=<9> ##[
170209 035711 0795 DE(N) ItemHandler.ItemLog event=<RECONVERT>, *****************************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForData>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4000: SCAN01>, chuteListStartPoint=<-1>, itemRevisionNumber=<10> ##[
170209 035711 0795 DE(N) ItemHandler.ItemLog event=<DESTINATION_REQUEST>, *******************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600004019604475>, type=<C0>, result=<OK>, ccType=<>), 
170209 035711 0797 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM_REPLY>, *******************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>,
170209 035711 0798 DE(N) ItemHandler.ItemLog event=<ITEM_INDUCTED>, *************************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForData>, inductionId=<3: IU04>, inductionMode=<SCANNER>, inductStatus=<NORMAL_ITEM>, carrierId=<469>, carrierCount=<1>, CmcdestinationId=<3000: Parked0>, position=<sorter#0: MS01>, itemRevisionNumber=<7> ##[

目标:

我想要做的是在Windows中使用gawk找到ITEMID的第一次出现并抓住日期和时间以及最后一次出现它并抓住数据和时间。并将这些放在一行,如

ITEMID  170209 035711   170209 035932

有没有办法可以使用GREP或AWK或组合

来做到这一点

由于

2 个答案:

答案 0 :(得分:2)

我会写:

=MATCH(REPLACE(A1,5,1,"e"),A:A)

您是否需要按日期或按ID或...排序的输出?你想一次只查找一个id吗?

答案 1 :(得分:1)

oneliner是:

gawk '{ a = gensub(/([0-9]{6} [0-9]{6} [0-9]{4}).*itemId=<([0-9]+)>.*/, "\\2 \\1", "g", $0); b = split(a, c, " "); if (c[1] in result) result[c[1]] = gensub(/(.+),(.+)/, "\\1," c[2] " " c[3] " " c[4], "g", result[c[1]]); else result[c[1]] = c[2] " " c[3] " " c[4] "," c[2] " " c[3] " " c[4]} END { for (i in result) print i ": " result[i]}' test.txt

让我评价:

  • var a包含itemId和行中的日期
  • 我们拆分使用空格,a [1]包含itemId,a [2],[3],[4]部分日期
  • 如果itemId尚未存在于数组&#34;结果&#34;中,我们将日期两次(!)放入数组&#34;结果&#34;索引itemId,
  • 如果itemId已经存在,我们将新发现的日期替换为第二个日期。

这给我们带了一个assoc数组,其中itemId为键,值为第一个和最后一个日期,用逗号分隔。

gawk '{ 
  a = gensub(/([0-9]{6} [0-9]{6} [0-9]{4}).*itemId=<([0-9]+)>.*/, "\\2 \\1", "g", $0);
  b = split(a, c, " "); 
  if (c[1] in result) 
     result[c[1]] = gensub(/(.+),(.+)/, "\\1" "," c[2] " " c[3] " " c[4], "g", result[c[1]]);
  else result[c[1]] = c[2] " " c[3] " " c[4] "," c[2] " " c[3] " " c[4]
} END { for (i in result) print i ": " result[i]}' test.txt

结果是:

12560311: 170209 035711 0715,170209 035711 0798
12560209: 170209 035711 0717,170209 035711 0717
12560284: 170209 035711 0638,170209 035711 0795
12560317: 170209 035711 0792,170209 035711 0792

编辑: 在Windows上运行它并没有正常工作。简化了答案:

awk "
!first[$8] {first[$8] = $1 FS $2 FS $3} 
{last[$8] = $1 FS $2 FS $3 } 
END {
for (id in first) {
   print gensub(/itemId=<([^>]+)>,/, \"\\1\", \"g\", id) FS first[id] FS last[id]}
}" Item.log

感谢@glennjackman的灵感。 ;-)请注意在Windows上运行此引号的转义。