在这种情况下,使用两个数组哈希的最佳方法是什么?

时间:2016-11-03 16:43:38

标签: perl-data-structures

处理这两个数组哈希的最佳方法是什么?第一个数据集包含xml数据,第二个数据集来自csv文件,其目的是检查第二个数据集中的文件名是否在第一个数据集中,如果是,则计算文件传递的延迟。我不知道如何最好地生成我可以使用的可行哈希(或更改现有的哈希值以将文件名作为键或者可能以某种方式将它们合并在一起),任何反馈将不胜感激

数据集1(xml数据):

$VAR1 = [
      {
        'StartTimestamp' => 1478146371,
        'EndTimestamp' => 1478149167,
        'FileName' => 'a3_file_20161024.req',
        'Stage' => 'SentUserResponse'
      },
      {
        'StartTimestamp' => 1478146375,
        'EndTimestamp' => 1478149907,
        'FileName' => 'a2_file_20161024.req',
        'Stage' => 'SentUserResponse'
      },
      {
        'StartTimestamp' => 1478161030,
        'EndTimestamp' => 1478161234,
        'FileName' => 'file_DEX_0.req',
        'Stage' => 'SentUserResponse'
      },

csv文件中的数据集2:

$VAR1 = [
      {
        'FileName' => 'a3_file_20161024.req',
        'ExpectedTime' => '20:04:07'
      },
      {
        'FileName' => 'a2_file_20161024.req',
        'ExpectedTime' => '20:14:39'
      },
      {
        'FileName' => 'file_DEX_0.req',
        'ExpectedTime' => '20:48:40'
      },

使用的代码:

sub Demo {
my $api_ref = GetData($apicall);
my $csvdata = ReadDataFile();
print Dumper($api_ref);
print "-------------------------*********--------------************------------------\n";
print Dumper ($csvdata);
print "#####################\n";

}

sub ReadDataFile {
    my $parser = Text::CSV::Simple->new;
    $parser->field_map(qw/FileName ExpectedTime/);
    my @csv_data = $parser->read_file($datafile);
    return \@csv_data;

}

sub GetData {
my ($xml) = @_;
my @api_data;
my %request;
my $t = XML::Twig->new(
    twig_handlers => {
        '//UserRequest' => sub {
            push @api_data, {%request} if %request;
            %request = ();
            $_->purge;    # free memory
        },
        '//UserRequest/HomeFileName' => sub {
            $request{FileName} = $_->trimmed_text;
        },
        '//UserRequest/Stage' => sub {
            $request{Stage} = $_->trimmed_text;
        },
        '//UserRequest/StartTimestamp' => sub {
            $request{StartTimestamp} = str2time(substr($_->trimmed_text, -8));
        },
        '//UserRequest/EndTimestamp' => sub {
            $request{EndTimestamp} = str2time(substr($_->trimmed_text, -8));
        },
    },
);
$t->xparse($xml);
$t->purge;
return \@api_data;

}

1 个答案:

答案 0 :(得分:0)

我假设你可以通过文件名比较将第一个数组的元素映射到第二个数组的元素,并且该关系是1:1的关系,我将执行以下步骤:

  1. 按文件名对列表进行排序或生成索引哈希
  2. 将两个集合合并为一个散列数组,或使用索引处理整个数据集
  3. 执行数据集所需的任何操作
  4. 只是一个小例子:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    
    my $api_ref = [
        {
            'StartTimestamp' => 1478146371,
            'EndTimestamp'   => 1478149167,
            'FileName'       => 'a3_file_20161024.req',
            'Stage'          => 'SentUserResponse'
        },
        {
            'StartTimestamp' => 1478146375,
            'EndTimestamp'   => 1478149907,
            'FileName'       => 'a2_file_20161024.req',
            'Stage'          => 'SentUserResponse'
        },
        {
            'StartTimestamp' => 1478161030,
            'EndTimestamp'   => 1478161234,
            'FileName'       => 'file_DEX_0.req',
            'Stage'          => 'SentUserResponse'
        }
    ];
    
    my $csvdata = [
        {
            'FileName'     => 'a3_file_20161024.req',
            'ExpectedTime' => '20:04:07'
        },
        {
            'FileName'     => 'a2_file_20161024.req',
            'ExpectedTime' => '20:14:39'
        },
        {
            'FileName'     => 'file_DEX_0.req',
            'ExpectedTime' => '20:48:40'
        }
    ];
    
    # generate the index
    my %index = ();
    
    
    for ( my $i = 0 ; $i < @{$api_ref} ; $i++ ) {
        $index{ $api_ref->[$i]{FileName} }{api_idx} = $i;
    }
    
    for ( my $i = 0 ; $i < @{$csvdata} ; $i++ ) {
        $index{ $csvdata->[$i]{FileName} }{csv_idx} = $i;
    }
    
    # filter for elements not present in both data sets
    my @filename_intersection =
      grep { exists $index{$_}{api_idx} && exists $index{$_}{csv_idx} }
      ( keys %index );
    
    foreach my $filename (@filename_intersection) {
    
        # do something with
        my $api_entry = $api_ref->[ $index{$filename}{api_idx} ];
        my $csv_entry = $csvdata->[ $index{$filename}{csv_idx} ];
    
    # example convert ExpectedTime into seconds and compare it to Start/End time difference
        $csv_entry->{ExpectedTime} =~ /^(\d{2}):(\d{2}):(\d{2})$/;
        my $exp_sec  = ( $1 * 60 + $2 ) * 60 + $3;
        my $real_sec = $api_entry->{EndTimestamp} - $api_entry->{StartTimestamp};
    
        my $msg = "";
        if ( $exp_sec >= $real_sec ) {
            $msg = "in time:";
        }
        else {
            $msg = "late:";
        }
    
        printf
          "Filename %s was %s; expected time: %d seconds, real time: %d seconds\n",
          $filename, $msg, $exp_sec, $real_sec;
    }
    

    最佳, 弗兰克