如何从文本文件中提取某些字段并将其输出到每条记录的一行?

时间:2014-10-09 14:15:11

标签: perl awk sed

我在文本文件中有nmap扫描的输出。记录由--分隔。如何提取某些字段并将它们输出到每条记录的单行上,字段用分隔符分隔?

以下是输入文件的示例:

--
Nmap scan report for mail.mydomain.com (146.221.53.49)
Host is up (0.23s latency).
PORT    STATE SERVICE
443/tcp open  https
| ssl-cert: Subject: commonName=mail.mydomain.com/organizationName=The Company & Co. LLC/stateOrProvinceName=Paris/countryName=FR
| Issuer: commonName=DigiCert Secure Server CA/organizationName=DigiCert Inc/countryName=US
| Public Key type: rsa
| Public Key bits: 2048
| Not valid before: 2013-12-26T00:00:00+00:00
| Not valid after:  2015-01-21T12:00:00+00:00
| MD5:   c528 4a28 4860 0a8c 112c 5f91 b63a 1d82
--
Nmap scan report for www.firstdomain.net (66.103.112.215)
Host is up (0.21s latency).
PORT    STATE SERVICE
443/tcp open  https
| ssl-cert: Subject: commonName=*.firstdomain.net/organizationName=FIRSTDOMAIN Ltd./stateOrProvinceName=Sofia/countryName=RO
| Issuer: commonName=GeoTrust SSL CA - G2/organizationName=GeoTrust Inc./countryName=US
| Public Key type: rsa
| Public Key bits: 2048
| Not valid before: 2014-09-28T23:00:00+00:00
| Not valid after:  2018-09-28T22:59:59+00:00
| MD5:   ad44 e45f f677 14d9 bccf 8198 7002 457e
--
Nmap scan report for owa.second-domain.com.com.rs (156.113.124.14)
Host is up (0.21s latency).
PORT    STATE SERVICE
443/tcp open  https
| ssl-cert: Subject: commonName=owa.second-domain.com.com.rs/organizationName=Second Corporation LP/stateOrProvinceName=Malta/countryName=MT
| Issuer: commonName=VeriSign Class 3 Secure Server CA - G3/organizationName=VeriSign, Inc./countryName=US
| Public Key type: rsa
| Public Key bits: 2048
| Not valid before: 2013-09-04T23:00:00+00:00
| Not valid after:  2014-11-04T23:59:59+00:00
| MD5:   7c54 3427 bc82 f94d 4448 3d19 6700 4fbe
--

预期产出:

146.221.53.49; mail.mydomain.com; The Company & Co. LLC; Paris; FR; DigiCert Secure Server CA; 2013-12-26; 2015-01-21; c528 4a28 4860 0a8c 112c 5f91 b63a 1d82
66.103.112.215; =*.firstdomain.net; FIRSTDOMAIN Ltd.; Sofia; RO; 2014-09-28; 2018-09-28; ad44 e45f f677 14d9 bccf 8198 7002 457e

1 个答案:

答案 0 :(得分:0)

如前所述,简单地发布要求并等待一个善良的灵魂为你做你的工作是非常不满的。但这不是一个非常简单的任务,我相信Nmap::Parser模块需要XML作为输入,所以这里有一些东西可以帮助你入门。

use strict;
use warnings 'all';
use 5.010;
use autodie;

use constant REQUIRED_FIELDS => qw/
    host
    name
    organizationName
    stateOrProvinceName
    countryName
    issuerCommonName
    startDate
    endDate
    MD5
/;

open my $fh, '<', 'nmap.nmap';

my (@data, %item);

while (<$fh>) {

  if (/\A--$/) {
    push @data, { %item } if %item;
    %item = ();
  }
  elsif ( m{Issuer:} ) {
    $item{'issuer'.ucfirst $1} = $2 while m{(\w+)=([^/]+)(?<=\S)}g;
  }
  elsif ( m{Not valid (before|after):\s*([\d-]+)} ) {
    my $key = $1 eq 'before' ? 'startDate' : 'endDate';
    $item{$key} = $2;
  }
  elsif ( m{\ANmap scan report for ([\w.-]+)\s+\(([\d.]+)\)} ) {
    $item{name} = $1;
    $item{host} = $2;
  }
  elsif (m{(MD5):\s*([a-z0-9\s]+(?<=\S))}) {
    $item{MD5} = $2;
  }
  else {
    $item{$1} = $2 while m{(\w+)=([^/]+)(?<=\S)}g;
  }
}
push @data, { %item } if %item;


for my $item (@data) {
  print join('; ', @{$item}{(REQUIRED_FIELDS)}), "\n";
}

<强>输出

146.221.53.49; mail.mydomain.com; The Company & Co. LLC; Paris; FR; DigiCert Secure Server CA; 2013-12-26; 2015-01-21; c528 4a28 4860 0a8c 112c 5f91 b63a 1d82
66.103.112.215; www.firstdomain.net; FIRSTDOMAIN Ltd.; Sofia; RO; GeoTrust SSL CA - G2; 2014-09-28; 2018-09-28; ad44 e45f f677 14d9 bccf 8198 7002 457e
156.113.124.14; owa.second-domain.com.com.rs; Second Corporation LP; Malta; MT; VeriSign Class 3 Secure Server CA - G3; 2013-09-04; 2014-11-04; 7c54 3427 bc82 f94d 4448 3d19 6700 4fbe