在Perl中映射两个数据集

时间:2011-02-04 14:04:17

标签: perl parsing mapping analysis

我有一个数据集,其中包含与这些UA相对应的用户代理和设备的列表。还有另一个数据集与用户代理一起具有其他数据。我需要一种方法来识别该数据中的设备。

因此,我必须在两个文件中映射UA,然后从包含该列表的文件中获取相应的设备信息。我已经从第一个文件中创建了一个UA列表并将其与数据文件中的UA相匹配。如何从包含设备信息的第一个文件中获取相应信息并将其写入文件?

#!/usr/bin/perl

use warnings;
use strict;

our $inputfile = $ARGV[0];
our $outputfile = "$inputfile" . '.devidx';
our $devid_file = "devid_master";  # the file that has the UA and the corresponding device info
our %ua_list_hash = ();

# Create a list of mobile user agents in the devid_master file
 open DEVID, "$devid_file" or die "can't open $devid_file";

 while(<DEVID>) {
        chomp;
        my @devidfile = split /\t/;
        $ua_list_hash{$devidfile[1]} = 0;
 }  

 open IN,"$inputfile" or die "can't open $inputfile";       
 while(<IN>) {      
       chomp;       
       my @hhfile = split /\t/;       

       if(exists $ua_list_hash{$hhfile[24]}) {     
                     # how do I get the rest of the columns from the devidfile, columns 2...10?
       }
 }

 close IN;

或者有更好的方法吗?Perl?这总是受欢迎的:)。

2 个答案:

答案 0 :(得分:2)

在构建第一个查找哈希时,您是否不能将对其他列数据的引用存储为哈希值,而不仅仅是0?

#!/usr/bin/perl

use warnings;
use strict;

our $inputfile = $ARGV[0];
our $outputfile = "$inputfile" . '.devidx';
our $devid_file = "devid_master";  # the file that has the UA and the corresponding device info
our %ua_list_hash = ();

# Create a list of mobile user agents in the devid_master file
 open DEVID, "$devid_file" or die "can't open $devid_file";
 while(<DEVID>) {
        chomp;
        my @devidfile = split /\t/;
        # save the columns you'll want to access later and
        # store a reference to them as the hash value
        my @values = @devidfile[2..$#devidfile];   
        $ua_list_hash{$devidfile[1]} = \@values;
 }  

 open IN,"$inputfile" or die "can't open $inputfile";       
 while(<IN>) {      
       chomp;       
       my @hhfile = split /\t/;       

       if(exists $ua_list_hash{$hhfile[24]}) {
           my @rest_of_vals = @{$ua_list_hash{$hhfile[24]};
           # do something with @rest_of_vals
       }
 }

 close IN;

注意:我没有对此进行测试。

答案 1 :(得分:0)

您希望输出看起来像什么? $ inputfile中出现的所有唯一设备的列表。或者对于$ inputfile中的每一行,输出一行显示它是哪个设备?

我会回答后者,因为如果需要你可以对它做一个独特的排序。此外,看起来每个UA都有多个设备。作为一般方法,您可以将UA名称存储为哈希中的键,值可以是设备名称数组,也可以是字符分隔的设备名称字符串。

如果您知道设备名称是元素2..10,则可以使用切片和连接运算符来构造,例如,逗号分隔的设备名称字符串。该字符串将是分配给UA名称密钥的值。

  #!/usr/bin/perl

    use warnings;
    use strict;

    our $inputfile = $ARGV[0];
    our $outputfile = "$inputfile" . '.devidx';
    our $devid_file = "devid_master";  # the file that has the UA and the corresponding device info
    our %ua_list_hash = ();

    # Create a list of mobile user agents in the devid_master file
     open DEVID, "$devid_file" or die "can't open $devid_file";

     while(<DEVID>) {
            chomp;
            my @devidfile = split /\t/;
            my @slice = @devidfile[2..10];
            my $deviceString = join(",", @slice);
            $ua_list_hash{$devidfile[1]} = $deviceString;
     }  

     my $outputfilename = "output.txt";
     open IN,"$inputfile" or die "can't open $inputfile";
     open OUT,"$outputfilename" or die "can't open $outputfilename";       
     while(<IN>) {      
           chomp;       
           my @hhfile = split /\t/;       

           if(exists $ua_list_hash{$hhfile[24]}) {     
               print OUT $ua_list_hash{$hhfile[24]}."\n";   

           }
     }

 close IN;
 close OUT;