Perl:2GB后的index / rindex?

时间:2015-02-24 00:35:17

标签: perl

索引和rindex在短字符串上运行良好:

$ perl -e '$a=("x"x((2<<30)-2))."b";print index($a,"b")'
2147483646
$ perl -e '$a=("x"x((2<<30)-2))."b";print rindex($a,"b")'
2147483646

然而,在长字符串上,它们的表现完全不同:

$ perl -e '$a=("x"x((2<<30)-1))."b";print index($a,"b")'
Segmentation fault (core dumped)
$ perl -e '$a=("x"x((2<<30)-1))."b";print rindex($a,"b")'
-1

这是预期的吗?有一个很好的解决方法吗?

$ perl -V
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:

  Platform:
    osname=linux, osvers=3.2.0-58-generic, archname=x86_64-linux-gnu-thread-multi
    uname='linux brownie 3.2.0-58-generic #88-ubuntu smp tue dec 3 17:37:58 utc 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.18.2 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.8.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.18.2
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
                        PERL_DONT_CREATE_GVSV
                        PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
                        PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL
                        USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
                        USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
                        USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
                        USE_REENTRANT_API
  Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
    DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/702096 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
    DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.18.2-2ubuntu1 in patchlevel.h
    DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
    DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
    DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD
    DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:fixes/net_ftp_failed_command - [rt.cpan.org #37700] http://bugs.debian.org/491062 Net::FTP: cope gracefully with a failed command
    DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time
    DEBPKG:fixes/module_metadata_security_doc - [68cdd4b] CVE-2013-1437 documentation fix
    DEBPKG:fixes/module_metadata_taint_fix - [bff978f] http://bugs.debian.org/722210 [rt.cpan.org #88576] untaint version, if needed, in Module::Metadata
    DEBPKG:fixes/IPC-SysV-spelling - http://bugs.debian.org/730558 [rt.cpan.org #86736] Fix spelling of IPC_CREAT in IPC-SysV documentation
    DEBPKG:fixes/fix-undef-source -
  Built under linux
  Compiled at Mar 27 2014 18:30:28
  %ENV:
    PERL_MB_OPT="--install_base "/home/tange/perl5""
    PERL_MM_OPT="INSTALL_BASE=/home/tange/perl5"
  @INC:
    /etc/perl
    /usr/local/lib/perl/5.18.2
    /usr/local/share/perl/5.18.2
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.18
    /usr/share/perl/5.18
    /usr/local/lib/site_perl
    .

2 个答案:

答案 0 :(得分:9)

This bug was fixed并将在5.22.0中,应该在5月份出来。它没有进入5.20.x系列。该补丁很简单,您应该能够将其应用于5.20.2或5.18.2代码库并重新编译。

正则表达式存在同样的问题,并且在5.21中也已修复。

解决方法是在记忆中没有2个演出,这是一般的好习惯。如果从文件中读取,可能read在块中读取并在每个块上使用索引。

如果您必须有一个2 gig字符串,请使用substr()以块的形式检查它。不幸的是,这意味着你必须将字符串复制成碎片。下面的代码故意在循环外初始化$substr,因此Perl不会多次重新分配内存。

use v5.18;
use strict;
use warnings;

sub big_index {
    my $ref = shift;
    my $match = shift;

    state $block_size = 2**29;

    my $strlen   = length($$ref);
    my $matchlen = length($match);

    # No point in doing extra work if we don't need to.
    return index($$ref, $match) if $strlen < $block_size;

    my $substr = '';
    my $offset = 0;
    for(
        my $offset = 0;
        $offset < $strlen;
        $offset += ($block_size - $matchlen - 1)
    ) {
        $substr = substr($$ref, $offset, $block_size);
        my $ret = index $substr, $match;
        return $ret + $offset if $ret != -1;
    }

    return -1;
}

答案 1 :(得分:0)

感谢@Schwern的基本代码。此版本避免直接使用substr复制2 GB字符串。它支持$position之类的其他index参数。我也做了rindex版本。

index / rindex之类的呼叫,除了对源字符串的引用进行调用外:

index64(\$string, $match);
index64(\$string, $match, $position);
rindex64(\$string, $match);
rindex64(\$string, $match, $position);


sub index64 {
    # Do index on strings > 2GB.
    # index in Perl < v5.22 does not work for > 2GB
    # Input:
    #   as index
    # Output:
    #   as index
    my $ref = shift;
    my $match = shift;
    my $pos = shift || 0;
    my $block_size = 2**31-1;
    my $strlen   = length($$ref);
    # No point in doing extra work if we don't need to.
    if($strlen < $block_size) {
        return index($$ref, $match, $pos);
    }

    my $matchlen = length($match);
    my $ret;
    my $offset = $pos;
    while($offset < $strlen) {
        $ret = index(
            substr($$ref, $offset, $block_size),
            $match, $pos-$offset);
        if($ret != -1) {
            return $ret + $offset;
        }
        $offset += ($block_size - $matchlen - 1);
    }
    return -1;
}

sub rindex64 {
    # Do rindex on strings > 2GB.
    # rindex in Perl < v5.22 does not work for > 2GB
    # Input:
    #   as rindex
    # Output:
    #   as rindex
    my $ref = shift;
    my $match = shift;
    my $pos = shift;
    my $block_size = 2**31-1;
    my $strlen = length($$ref);
    # Default: search from end
    $pos = defined $pos ? $pos : $strlen;
    # No point in doing extra work if we don't need to.
    if($strlen < $block_size) {
        return rindex($$ref, $match, $pos);
    }

    my $matchlen = length($match);
    my $ret;
    my $offset = $pos - $block_size + $matchlen;
    if($offset < 0) {
        # The offset is less than a $block_size
        # Set the $offset to 0 and
        # Adjust block_size accordingly
        $block_size = $block_size + $offset;
        $offset = 0;
    }
    while($offset >= 0) {
        $ret = rindex(
            substr($$ref, $offset, $block_size),
            $match);
        if($ret != -1) {
            return $ret + $offset;
        }
        $offset -= ($block_size - $matchlen - 1);
    }
    return -1;
}