计算项目放置在单元格中的次数

时间:2013-06-20 20:20:57

标签: perl count grid cell repeat

我随机填充一个网格,其中笛卡尔坐标从0到100(100x100x100网格)归一化,并且每个数据点的“强度”从0到256归一化。以下是我在Perl中的代码的摘录:

open(FILE,$file);
while(sysread(FILE,$data,16)) {
    @row=unpack("f>4",$data);   # input file is binary so I had to convert here
    $x=int((($row[0] - $xmin)/($xmax - $xmin)*10) + 0.5); # max and min variables
    $y=int((($row[1] - $ymin)/($ymax - $ymin)*10) + 0.5); # are previously defined
    $z=int((($row[2] - $zmin)/($zmax - $zmin)*10) + 0.5);
    $i=int(($row[3]/62*256) + 0.5);
    $i=255 if ($i>255);

    $out[$x][$y][$z]=$i;   # simply assigns an intensity for each data
                           # point (in random order), only 1 point can be
                           # added to each 1x1x1 cell
}

某些点太靠近并且被放置在相同的1x1x1单元格中。发生这种情况时,添加的每个强度都会覆盖前一个强度。如何计算单元格中放置多个点的次数?

提前致谢!

3 个答案:

答案 0 :(得分:1)

您可以使用其他哈希轻松完成此操作,只需将所有密钥($x$y$z)合并到一个密钥中,然后将哈希值设置为<每当插入一个值时,em> true 。

my %visited_points; 

open(FILE,$file);
while(sysread(FILE,$data,16)) {
    @row=unpack("f>4",$data);   # input file is binary so I had to convert here
    $x=int((($row[0] - $xmin)/($xmax - $xmin)*10) + 0.5); # max and min variables
    $y=int((($row[1] - $ymin)/($ymax - $ymin)*10) + 0.5); # are
    $z=int((($row[2] - $zmin)/($zmax - $zmin)*10) + 0.5);
    $i=int(($row[3]/62*256) + 0.5);
    $i=255 if ($i>255);

    my $key = "$x$y$z";
    # check if something already occupies this cell
    if( exists( $visited_points{$key} ) ) {
        # take some other action
    }

    $out[$x][$y][$z]=$i;   # simply assigns an intensity for each data
                           # point (in random order), only 1 point can be
                           # added to each 1x1x1 cell

    # mark that there is something in this cell
    $visited_points{$key} = 1;
}

如果你想数数,你可以轻松计算数值。

答案 1 :(得分:1)

为了让hpc(高性能计算)更友好,我发现代替$ key和if-loop,只需输入这样的计数。

open(FILE,$file);
while(sysread(FILE,$data,16)) {
    @row=unpack("f>4",$data);   # input file is binary so I had to convert here
    $x=int((($row[0] - $xmin)/($xmax - $xmin)*10) + 0.5); # max and min variables
    $y=int((($row[1] - $ymin)/($ymax - $ymin)*10) + 0.5); # are previously defined
    $z=int((($row[2] - $zmin)/($zmax - $zmin)*10) + 0.5);
    $i=int(($row[3]/62*256) + 0.5);
    $i=255 if ($i>255);

    $count[$x][$y][$z]+=1;

    $out[$x][$y][$z]=$i;   # simply assigns an intensity for each data
                           # point (in random order), only 1 point can be
                           # added to each 1x1x1 cell
}

然后,如果$ count [$ x] [$ y] [$ z]大于1,则表示该bin中放入了多个点。如果它等于1,那么只有一个点放在那里,如果它小于1,则bin是空的。

答案 2 :(得分:0)

另一个版本的Hunter解决方案取代了哈希(带编码密钥);带有数组(带编码索引)。

优点:可能会略微提高性能。更可能的是,没有足够大的余地,但事实并非如此,但要确保自己的基准测试。

缺点:牺牲记忆力。如果您的网格稀疏地填充 - 比如100万分中的1000个 - 您将在散列中存储1000个元素,但在阵列中存储1,000,000个元素。

# my @visited_points;

my $key = $x * 10000 + $y * 100 + $z;

# Mark as visited
$visited_points[$key]++;

# Check if visited:
if (defined $visited_points[$key]) {
    # Bail out?
}

# Check how many times visited?
# Use trinary ?: operator to gracefully convert undef to 0
my $count = $visited_points[$key] ? $visited_points[$key] : 0;