分析Perl中的人口普查数据

时间:2013-11-22 04:48:33

标签: perl

我需要编写一个解析人口普查数据(http://pastebin.com/hNzke4V8)的perl脚本。

脚本需要解析数据,并且对于每个县,打印县名,每平方英里土地的人口(人口/土地面积),以及作为水的县的百分比(水域/(土地)区域+水域))。

最后,脚本需要打印县名和以下条件的值。

Highest population density
Lowest population density
Highest percentage of water
Lowest percentage of water

下面是输出结果的示例:

County                Population/sq mile    % Water
Adams County    307.2                         88.1%
Asotin County     111.8                        12.6%
[... etc ...]

Highest population density: Adams County, 9999 people/square mile
Lowest population density: Pierce County, 3 people/square mile
Highest percentage of water: Whitman County, 90.2% water
Lowest percentage of water: Skagit County, 3.6% water

这是我到目前为止所提出的(我对perl不是很熟悉):

#!/usr/bin/perl -w
use strict;
use warnings;

#initialize stuff
my %water;
my %popdensity
my @fields;
my $county;
my $lowest_pop_county=0; 
my $lowest_pop=9999999;
my $highest_pop_county=0;
my $highest_pop=0;
my $lowest_water_county=0;
my $lowest_water = 1;
my $highest_water_county = 0;
my $highest_water = 0;

#parse input
while (<>)
{
next if /County Name/;
chomp;
@fields = split /,/;
$water{$fields[0]} = $fields[3] / ($fields[2] + $fields[3]);
$popdensity{$fields[0]} = $fields[1] / $fields[2];

foreach $county (keys %water %popdensity)
{
    #print values
    print "The percent water for $county is %.2f%%\n", 100 * $water{$county};
    print "The population per square mile of land for $county is $popdensity{$county}\n";

    #determine highest and lowest values
    if ($highest_pop < $popdensity{$county})
    { 
        $highest_pop = popdensity{$county}; 
        $highest_pop_county = $county;
    } 
    if ($lowest_pop > $popdensity{$county})
    { 
        $lowest_pop = popdensity{$county}; 
        $lowest_pop_county = $county; 
    }
    if ($highest_water < $water{$county})
    { 
        $highest_water = $water{$county}; 
        $highest_water_county = $county;
    }
    if ($lowest_water > $water{$county})
    { 
        $lowest_water = $water{$county}; 
        $lowest_water_county = $county; 
    }
}

#print highest and lowest values
print "Highest population density: $highest_pop_county, $highest_pop\n"
print "Lowest population density: $lowest_pop_county, $lowest_pop\n"
print "Highest percentage of water: $highest_water_county, $highest_water\n"
print "Lowest percentage of water: $lowest_water_county, $lowest_water\n"
}

不幸的是,当我尝试运行脚本(perl -w script.txt census.txt)时,遇到以下错误:

Operator or semicolon missing before %popdensity at script.txt line 28.
Ambiguous use of % resolved as operator % at script.txt line 28.
syntax error at script.txt line 8, near "my "
Global symbol "@fields" requires explicit package name at script.txt line 8.
Global symbol "$lowest_pop_county" requires explicit package name at script.txt line 10.
Global symbol "$lowest_pop" requires explicit package name at script.txt line 11.
Global symbol "$highest_pop_county" requires explicit package name at script.txt line 12.
Global symbol "$highest_pop" requires explicit package name at script.txt line 13.
Global symbol "$lowest_water_county" requires explicit package name at script.txt line 14.
Global symbol "$lowest_water" requires explicit package name at script.txt line 15.
Global symbol "$highest_water_county" requires explicit package name at script.txt line 16.
Global symbol "$highest_water" requires explicit package name at script.txt line 17.
Global symbol "@fields" requires explicit package name at script.txt line 24.
Global symbol "@fields" requires explicit package name at script.txt line 25.
Global symbol "@fields" requires explicit package name at script.txt line 25.
Global symbol "@fields" requires explicit package name at script.txt line 25.
Global symbol "@fields" requires explicit package name at script.txt line 25.
Global symbol "@fields" requires explicit package name at script.txt line 26.
Global symbol "@fields" requires explicit package name at script.txt line 26.
Global symbol "@fields" requires explicit package name at script.txt line 26.
Type of arg 1 to keys must be hash (not modulus (%)) at script.txt line 29, near     "popdensity)

我做错了什么?在此先感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

一堆语法错误。 while循环缺少分号和缺少}。 keys只需要一个哈希值。但这不是问题,因为这两个哈希都有相同的密钥。

此版本至少编译:

#!/usr/local/ActivePerl-5.16/bin/perl


#!/usr/bin/perl -w
use strict;
use warnings;

#initialize stuff
my %water;
my %popdensity;
my @fields;
my $county;
my $lowest_pop_county=0; 
my $lowest_pop=9999999;
my $highest_pop_county=0;
my $highest_pop=0;
my $lowest_water_county=0;
my $lowest_water = 1;
my $highest_water_county = 0;
my $highest_water = 0;

#parse input
while (<>)
{
next if /County Name/;
chomp;
@fields = split /,/;
$water{$fields[0]} = $fields[3] / ($fields[2] + $fields[3]);

$popdensity{$fields[0]} = $fields[1] / $fields[2];

foreach $county (keys %water)
{
    #print values
    print "The percent water for $county is %.2f%%\n", 100 * $water{$county};
    print "The population per square mile of land for $county is $popdensity{$county}\n";

    #determine highest and lowest values
    if ($highest_pop < $popdensity{$county})
    { 
        $highest_pop = $popdensity{$county}; 
        $highest_pop_county = $county;
    } 
    if ($lowest_pop > $popdensity{$county})
    { 
        $lowest_pop = $popdensity{$county}; 
        $lowest_pop_county = $county; 
    }
    if ($highest_water < $water{$county})
    { 
        $highest_water = $water{$county}; 
        $highest_water_county = $county;
    }
    if ($lowest_water > $water{$county})
    { 
        $lowest_water = $water{$county}; 
        $lowest_water_county = $county; 
    }
}
} # while loop

#print highest and lowest values
print "Highest population density: $highest_pop_county, $highest_pop\n";
print "Lowest population density: $lowest_pop_county, $lowest_pop\n";
print "Highest percentage of water: $highest_water_county, $highest_water\n";
print "Lowest percentage of water: $lowest_water_county, $lowest_water\n";

答案 1 :(得分:1)

你可以(实际应该)通过跟踪最高/最低流行/水来消除foreach循环。例如,如果新水%大于最后一个,则用新的替换最后一个。这样你总是拥有最高的水%。对其他三个值执行相同的操作。 foreach效率非常低,因为您遍历每个新县的所有密钥。

您的变量使用很好,但我倾向于使用hash of arrays(HoA)来跟踪高/低信息。这是一个HoA结构:

my %hash = ('high_pop' => ['King County','912.87']);

您按$hash{high_pop}[0]获取县名,按$hash{high_pop}[1]获得人口。

鉴于上述情况,请考虑以下事项:

use strict;
use warnings;

my %hash;
$hash{high_pop}[1]   = 0;
$hash{low_pop}[1]    = 9999999;
$hash{high_water}[1] = 0;
$hash{low_water}[1]  = 100;

print "County\tPopulation/sq mile\t% Water\n";
while (<>) {
    next if $. == 1;
    chomp;

    my @fields    = split /,/;
    my $popSqMi   = sprintf '%.2f', $fields[1] / $fields[2];
    my $percntWat = sprintf '%.2f', ( $fields[3] / ( $fields[2] + $fields[3] ) ) * 100;

    print "$fields[0]\t$popSqMi\t$percntWat\n";

    if ( $popSqMi > $hash{high_pop}[1] ) {
        $hash{high_pop}[0] = $fields[0];
        $hash{high_pop}[1] = $popSqMi;
    }

    if ( $popSqMi < $hash{low_pop}[1] ) {
        $hash{low_pop}[0] = $fields[0];
        $hash{low_pop}[1] = $popSqMi;
    }

    if ( $percntWat > $hash{high_water}[1] ) {
        $hash{high_water}[0] = $fields[0];
        $hash{high_water}[1] = $percntWat;
    }

    if ( $percntWat < $hash{low_water}[1] ) {
        $hash{low_water}[0] = $fields[0];
        $hash{low_water}[1] = $percntWat;
    }
}

print "\nHighest population density: $hash{high_pop}[0], $hash{high_pop}[1]\n";
print "Lowest population density: $hash{low_pop}[0], $hash{low_pop}[1]\n";
print "Highest percentage of water: $hash{high_water}[0], $hash{high_water}[1]\n";
print "Lowest percentage of water: $hash{low_water}[0], $hash{low_water}[1]\n";

希望这有帮助!