Bash解析器并按条件查找重复项

时间:2015-10-01 20:39:39

标签: bash

我需要解析一个日志文件,例如:

    151.67.79.39/mnt3/WkJWwe3eYp/2w8PNGLrBh/158
    95.245.46.253/storage1/FV3QLXuaDG/PlfwC4BtV9/254
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
    87.17.174.236/storage1/IDtx9c2p7i/VwTNiwHAJF/255
    87.17.174.236/storage1/IDtx9c2p7i/VwTNiwHAJF/255
    87.17.174.118/storage1/IDtx9c2p7i/VwTNiwHAJF/255
    87.17.174.236/storage1/IDtx9c2p7i/VwTNiwHAJF/255
    87.161.130.61/storage1/IDtx9c2p7i/VwTNiwHAJF/255
    62.43.164.247/storage1/eDoT6fI4vp/76GwaRzJCL/31
    93.229.17.99/mnt3/uQi9iiyMZA/G83FZV2zCB/160
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
    93.40.125.31/storage1/4mN9uJGwA2/0uOM39Gx8g/10
    95.245.46.253/storage1/FV3QLXuaDG/PlfwC4BtV9/254
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
    87.17.174.236/storage1/IDtx9c2p7i/VwTNiwHAJF/255
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
    95.245.46.253/storage1/FV3QLXuaDG/PlfwC4BtV9/254
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
    94.38.149.210/storage1/RXhISkEsOw/AHwro83Lyp/97
    95.245.46.253/storage1/FV3QLXuaDG/PlfwC4BtV9/254
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
    95.245.46.253/storage1/FV3QLXuaDG/PlfwC4BtV9/254
    151.75.214.206/storage1/DeOq0ej9B2/fr48SLpuri/80
...

[IP] / [ITEM1] / [ITEM2] / [ITEM3] / [ITEM4]

我需要找到所有具有相同ITEM2的IP。 有任何想法吗? 谢谢:))

1 个答案:

答案 0 :(得分:1)

要求救援!

$ tr -d ' ' <file | awk -F"/" -vOFS=, '
        {a[$3]=a[$3]?a[$3] OFS $1:$1} 
     END{for(i in a) print i":"a[i]}'

将打印

RXhISkEsOw:94.38.149.210
uQi9iiyMZA:93.229.17.99
FV3QLXuaDG:95.245.46.253,95.245.46.253,95.245.46.253,95.245.46.253,95.245.46.253
eDoT6fI4vp:62.43.164.247
IDtx9c2p7i:87.17.174.236,87.17.174.236,87.17.174.118,87.17.174.236,87.161.130.61,87.17.174.236
4mN9uJGwA2:93.40.125.31
WkJWwe3eYp:151.67.79.39
DeOq0ej9B2:151.75.214.206,151.75.214.206,151.75.214.206,151.75.214.206,151.75.214.206,151.75.214.206,151.75.214.206

第一个tr是删除输入文件中的空格,如果你的文件是干净的,你可以删除

<强>更新 如果您想要所有UNIQUE IP而不是所有IP,那么这是一项不同的任务,但awk取决于它。

$ tr -d ' ' <file | awk -F"/" -vOFS=, '
       {k=$3 FS $1} 
  !d[k]{a[$3]=a[$3]?a[$3] OFS $1:$1;d[k]++} 
    END{for(i in a) print i":"a[i]}'

将导致(此处唯一性在ITEM2的上下文中定义)

RXhISkEsOw:94.38.149.210
uQi9iiyMZA:93.229.17.99
FV3QLXuaDG:95.245.46.253
eDoT6fI4vp:62.43.164.247
IDtx9c2p7i:87.17.174.236,87.17.174.118,87.161.130.61
4mN9uJGwA2:93.40.125.31
WkJWwe3eYp:151.67.79.39
DeOq0ej9B2:151.75.214.206

更新2

我希望你没有为程序员准备规范:)

如果列表大小大于1,则需要ITEM2的唯一IP列表。

$ tr -d ' ' <file | awk -F"/" -vOFS=, '
      {k=$3 FS $1} 
 !d[k]{a[$3]=a[$3]?a[$3] OFS $1:$1;d[k]++;c[$3]++} 
   END{for(i in a) if(c[i]>1) print i":"a[i]}'

将打印

IDtx9c2p7i:87.17.174.236,87.17.174.118,87.161.130.61

如果您不需要print语句中的项目前缀delete i":"。或者,如果您需要不同于逗号的分隔符,请更改OFS值。