perl正则表达式不匹配

时间:2015-01-25 23:58:01

标签: regex perl

我有一个文本文件(1.txt),其中包含以下信息和格式:

{
  "ip": "X.X.XX.8",
  "hostname": "No Hostname",
  "city": "Kuala Terengganu",
  "region": "Terengganu",
  "country": "MY",
  "loc": "5.3302,103.1408",
  "org": "AS4788 TM Net, Internet Service Provider"
}{
  "ip": "X.X.XX.143",
  "hostname": "No Hostname",
  "city": "Kuantan",
  "region": "Pahang",
  "country": "MY",
  "loc": "3.8077,103.3260",
  "org": "AS4788 TM Net, Internet Service Provider"
}{
  "ip": "X.X.XXX.76",
  "hostname": "No Hostname",
  "city": "Kuching",
  "region": "Sarawak",
  "country": "MY",
  "loc": "1.5310,110.3442",
  "org": "AS4788 TM Net, Internet Service Provider",
  "postal": "93700"
}{
  "ip": "X.X.XX.158",
  "hostname": "No Hostname",
  "city": "Seoul",
  "region": "Seoul-t'ukpyolsi",
  "country": "KR",
  "loc": "37.5985,126.9783",
  "org": "AS17839 DreamcityMedia"
}{
  "ip": "XX.XXX.X.87",
  "hostname": "No Hostname",
  "city": "Surat",
  "region": "Gujarat",
  "country": "IN",
  "loc": "20.9667,72.9000",
  "org": "AS45528 Tikona Digital Networks Pvt Ltd."
}{
  "ip": "XXX.XX.XXX.134",
  "hostname": "No Hostname",
  "city": "Bhandup",
  "region": "Maharashtra",
  "country": "IN",
  "loc": "19.1500,72.9333",
  "org": "AS45528 Tikona Digital Networks Pvt Ltd."
}{

我编写了以下perl代码,因此我可以将其输出到逗号分隔文件中:

use FileHandle;
use strict;

main();

sub main() {
    my $line_numbers = "";
    my $num_matches  = 0;
    my $first_match  = "";
    my $count        = 0;

    my $resource_location = "1.txt";

    my $output_fh = FileHandle->new("> 2.txt");

    open(FILE, "<", $resource_location) or die "cannot open < $resource_location: $!";

    my $output_str = "";
    foreach my $line (<FILE>) {
        $count++;
        my ($ip)       = $line =~ /"ip=([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})"/;
        my ($hostname) = $line =~ /"hostname:?([^"\s]+)"/;
        my ($city)     = $line =~ /"city:?([^"\s]+)"/;
        my ($region)   = $line =~ /"region:?([^"\s]+)"/;
        my ($country)  = $line =~ /"country:?([^"\s]+)"/;
        my ($org)      = $line =~ /"org:?([^"\s]+)"/;

        print $output_fh "$ip,$hostname,$city,$region,$country,$org\n";
    }

    print "$count   rows processed\n";

    close FILE;
    $output_fh->close;
}

当我运行脚本时,我得到的是逗号:

,,,,,
,,,,,
,,,,,
,,,,,
,,,,,
,,,,,

预期产出:

"X.X.XX.8","No Hostname","Kuala Terengganu","Terengganu", "MY","AS4788 TM Net, Internet Service Provider"
"X.X.XX.143","No Hostname","Kuantan","Pahang","MY","AS4788 TM Net, Internet Service Provider"
"X.X.XXX.76","No Hostname","Kuching","Sarawak","MY","AS4788 TM Net, Internet Service Provider"
"X.X.XX.158","No Hostname","Seoul","Seoul-t'ukpyolsi","KR","AS17839 DreamcityMedia"
"XX.XXX.X.87","No Hostname","Surat","Gujarat","IN","AS45528 Tikona Digital Networks Pvt Ltd."

我错过了什么?

2 个答案:

答案 0 :(得分:2)

的Ack!使用实际的JSON解析器来解析JSON比尝试破解脆弱的,容易出错的解决方案更容易!

好的,你实际上并没有JSON文件,而是一堆端到端的JSON文件。但这没问题; JSON :: XS的增量解析器(incr_parse)可以处理它。

use open ':std', ':encoding(UTF-8)';

use JSON::XS     qw( );
use Text::CSV_XS qw( );

my $json_parser = JSON::XS->new();
my $csv_formatter = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });

while ( my $file = do { local $/; <> } ) {
   for my $obj ( $json_parser->incr_parse($file) ) {
      my @row = @$obj{qw( ip hostname city region country org )};
      $csv_formatter->print(\*STDOUT, \@row);
   }
}

用法:

myparser.pl input.json >output.csv

答案 1 :(得分:0)

试试这个。我不知道json和json模块,但简单的代码给出了你期望的输出。

use warnings;
use strict;
open('data',"file");
$/ = "}";
my @ar  = <data>;
foreach (@ar){
my @xz = split("\n",$_);
my @ddta;
foreach my $v (@xz){
my @xvz = split(/.*\:(.*)/,$v);
push(@ddta,@xvz);
}
print "@ddta\n\n";
} 

您的数据由{分隔,因此我使用$/ = "}"。它将数据分成数组。试一试print $ar[0]

然后split(/.*\:(.*)/,$v):的相邻内容进行分组,以便将您的预期输出存储到@xvz中,然后打印出内部foreach condtion的一面