在perl中提取单词offset时出错

时间:2017-01-10 00:13:13

标签: perl offset

我有一个读取两个文件的程序,第一个文件包含由分号(;)分隔的术语(一个或多个),第二个文件包含文本,目标是确定第一个文件中术语的偏移量!

我的程序开始很好地波动真空(正确的偏移提取2 20,量子场也是45 59),但是当提取偏移时,例如术语核物理(正确的设置396 411)我的代码生成399 414!或Fermionic字段(我的代码生成138 154)但正确的是135 151

使用的代码是:

#!/usr/bin/perl
use strict;use warnings;

my @a = ();
my @b = ();
my @aa = ();
my $l=0;
my $v=1;
my $g=0;
my $kh;
my $ligne2;
my $texte;

open(FICC, $ARGV[0]);

print "choose the name of the file\n";
my $fic = <STDIN>;

open(FIC1, ">$fic");

while (<FICC>) {
    my $ligne2=$_;
    $a[$l]=$ligne2;
    $l++;
}

my $aa;
my $ligne;
my $rep = "C:\\scienceie2017_train\\train2";
opendir(REP,$rep) or die "E/S : $!\n";

foreach my $kh (@a) {
    chomp($kh);

    if ($kh=~/.*\.txt/) {
        $texte=$kh;
        #print "$kh";
        print FIC1 "$texte";
    }

    @aa=split(/;/,$kh);
    #$u++;
    while(defined(my $fic=readdir REP)){
        my $f="${rep}\\$texte";
        open FIC, "$f" or warn "$f E/S: $!\n";
        while(<FIC>){
            $ligne = $_;
            chomp($ligne);
            #print FIC1 "@aa";

            foreach my $che (@aa) {
                $che =~ s/^\s+//; 
                $che =~ s/\s+$//;
                if ($ligne =~/\Q$che\E/) {
                    print FIC1 "T$v\tTask $-[0] $+[0]\t$che\n";
                    $v++;
                }
            }
            $v = 1; 
        }
        print FIC1 "\n";
        close FIC;
        goto che
    }
che:
}

案文是:

  

脉动真空是量子场的一般特征,其中[1-12]中考虑的自由麦克斯韦场只是一个例子。诸如描述电子的费米子场也经历真空波动,因此当人们以某种​​方式限制时,人们期望找到与这些场相关的卡西米尔效应。这种效应首先在核物理的背景下,在所谓的核子的“MIT袋模型”中进行研究[13]。在袋模型中,人们将核子设想为描述受限夸克的费米子场的集合。这些夸克受到“袋子”表面的边界条件的影响,表示核子的表面。就像在电磁情况下一样,袋边界条件会改变磁场的真空波动,从而导致卡西米尔力的出现[14-18]。这种力虽然在宏观尺度上非常弱,但在核物理学中遇到的小长度尺度上可能是显着的。因此它对袋模核子的物理学有重要影响[19]。

提取的术语是:

  

真空波动;一般特征;量子场;自由麦克斯韦;自由麦克斯韦场;麦克斯韦;麦克斯韦场;麦克斯韦场;麦克斯韦场;考虑的场;考虑; 1-12;费米子场;真空波动;卡西米尔;卡西米尔影响; Casimir效应; Casimir效应;此类领域;此类影响;核物理;所谓“MIT;所谓”MIT袋;“MIT袋”; MIT袋型号;“袋型”;费米子田边界条件;核子表面;电磁情况;袋边界;袋边界条件;边界条件;真空波动;卡西米尔;卡西米尔力;力; 14-18;宏观尺度;小长度;小长度尺度;长度尺度;核物理;重要的后果;袋模型核子;

3 个答案:

答案 0 :(得分:0)

我不清楚您的代码,但是当我使用我的代码运行您提供的数据时,我会得到这些结果。

Variables-related-to-regular-expressions中描述了2个变量@-@+,($-[0]$+[0])。 (LAST_MATCH_START&LAST_MATCH_END)

我的代码:

#!/usr/bin/perl
use strict;
use warnings;

my $s = 'A fluctuating vacuum is a general feature ... (rest of line)';

my @terms = split /;/, 'fluctuating vacuum;Fermionic fields;nuclear physics;bag-model nucleon';

for my $term (@terms) {
    while ($s =~ /$term/g) {
        print "$-[0] - $+[0] $term\n";
    }   
}

输出:

2 - 20 fluctuating vacuum
135 - 151 Fermionic fields
396 - 411 nuclear physics
983 - 998 nuclear physics
1063 - 1080 bag-model nucleon

答案 1 :(得分:0)

#!/usr/bin/perl

$string = "A fluctuating vacuum is a general feature of quantum fields, of which the free Maxwell field considered in  [1–12] is but one example. Fermionic fields such as that describing the electron, also undergo vacuum fluctuations, consequently one expects to find Casimir effects associated with such fields whenever they are confined in some way. Such effects were first investigated in the context of nuclear physics, within the so-called “MIT bag model” of the nucleon  [13]. In the bag-model one envisages the nucleon as a collection of fermionic fields describing confined quarks. These quarks are subject to a boundary condition at the surface of the ‘bag’ that represents the nucleon’s surface. Just as in the electromagnetic case, the bag boundary condition modifies the vacuum fluctuations of the field, which results in the appearance of a Casimir force  [14–18]. This force, although very weak at a macroscopic scale, can be significant on the small length scales encountered in nuclear physics. It therefore has important consequences for the physics of the bag-model nucleon  [19].";

@extracted_terms = ( "fluctuating vacuum", "Fermionic fields", "nuclear physics", "bag-model nucleon" );

for my $term ( @extracted_terms )
{
    $position = index $string, $term;

    printf ( "%s, %s\n", $position, $position + length($term) );
}

答案 2 :(得分:0)

您必须以UTF8

打开您的文件

reaplce

open FIC, "<:encoding(UTF-8)", "$f" or warn "$f E/S: $!\n";

通过

Deleter