perl:持久化具有提交支持的字符串集

时间:2013-05-02 08:45:24

标签: perl persist

我有一组在25k迭代循环内修改的字符串。它在开头是空的,但在每个循环中随机添加或删除0-200个字符串。最后,该集包含大约80k字符串 我想让它恢复原状。该设置应在每个循环后保存到磁盘并在恢复时加载 我可以使用哪个库?原始数据量约为16M,但变化通常很小。我不希望它在每次迭代时重写整个商店。

由于字符串是路径,我想将它们存储在这样的日志文件中:

+a
+b
commit
-b
+d
commit

在开始时,文件被加载到哈希中,然后压缩。如果最后没有提交行,则不考虑最后一个块。

2 个答案:

答案 0 :(得分:1)

Storable package为您的Perl数据结构(SCALAR,ARRAY,HASH或REF对象)带来持久性,即可以方便地存储到磁盘并在以后检索的任何内容。

答案 1 :(得分:0)

我决定放下重型火炮并写下简单的东西:

package LoL::IMadeADb;

sub new {
  my $self;
  ( my $class, $self->{dbname} )  = @_;
  # open for read, then write. create if not exist
  #msg "open $self->{dbname}";
  open(my $fd, "+>>", $self->{dbname}) or die "cannot open < $self->{dbname}: $!";
  seek($fd, 0, 0);
  $self->{fd} = $fd;
  #msg "opened";
  $self->{paths} = {};
  my $href = $self->{paths};

  $self->{nlines} = 0;
  my $lastcommit = 0;
  my ( $c, $rest );
  while(defined($c = getc($fd)) && substr(($rest = <$fd>), -1) eq "\n") {
    $self->{nlines}++;
    chomp($rest);
    if ($c eq "c") {
      $lastcommit = tell($fd);
      #msg "lastcommit: " . $lastcommit;
    } elsif ($c eq "+") {
      $href->{$rest} = undef;
    } elsif ($c eq "-") {
      delete $href->{$rest};
    }
    #msg "line: '" . $c . $rest . "'";
  }
  if ($lastcommit < tell($fd)) {
    print STDERR "rolling back incomplete file: " . $self->{dbname} . "\n";
    seek($fd, $lastcommit, 0);
    while(defined($c = getc($fd)) && substr(($rest = <$fd>), -1) eq "\n") {
      $self->{nlines}--;
      chomp($rest);
      if ($c eq "+") {
        delete $href->{$rest};
      } else {
        $href->{$rest} = undef;
      }
    }
    truncate($fd, $lastcommit) or die "cannot truncate $self->{dbname}: $!";
    print STDERR "rolling back incomplete file; done\n";
  }
  #msg "entries = " . (keys( %{ $href })+0) . ", nlines = " . $self->{nlines} . "\n";
  bless $self, $class
}

sub add {
  my ( $self , $path ) = @_;
  if (!exists $self->{paths}{$path}) {
    $self->{paths}{$path} = undef;
    print { $self->{fd} } "+" . $path . "\n";
    $self->{nlines}++;
    $self->{changed} = 1;
  }
  undef
}

sub remove {
  my ( $self , $path ) = @_;
  if (exists $self->{paths}{$path}) {
    delete $self->{paths}{$path};
    print { $self->{fd} } "-" . $path . "\n";
    $self->{nlines}++;
    $self->{changed} = 1;
  }
  undef
}

sub save {
  my ( $self ) = @_;
  return undef unless $self->{changed};
  my $fd = $self->{fd};
  my @keys = keys %{$self->{paths}};
  if ( $self->{nlines} - @keys > 5000 ) {
    #msg "compacting";
    close($fd);
    my $bkpdir = dirname($self->{dbname});
    ($fd, my $bkpname) = tempfile(DIR => $bkpdir , SUFFIX => ".tmp" ) or die "cannot create backup file in: $bkpdir: $!";
    $self->{nlines} = 1;
    for (@keys) {
      print { $fd } "+" . $_ . "\n" or die "cannot write backup file: $!";
      $self->{nlines}++;
    }
    print { $fd } "c\n";
    close($fd);
    move($bkpname, $self->{dbname})
      or die "cannot rename " . $bkpname . " => " . $self->{dbname} . ": $!";
    open($self->{fd}, ">>", $self->{dbname}) or die "cannot open < $self->{dbname}: $!";
  } else {
    print { $fd } "c\n";
    $self->{nlines}++;

    # flush:
    my $previous_default = select($fd);
    $| ++;
    $| --;
    select($previous_default);
  }
  $self->{changed} = 0;
  #print "entries = " . (@keys+0) . ", nlines = " . $self->{nlines} . "\n";
  undef
}
1;