Perl脚本搜索/替换和转换结果

时间:2016-05-26 20:34:51

标签: regex perl

我正在运行一个简单的Perl脚本,该脚本会复制从allocateDirect()\txt开头的所有行。到现在为止还挺好。

\xtx

现在我想&#34;擦洗&#34;所有以use strict; use warnings; $^I = '.bak'; while ( <> ) { s/(\\txt )(.*)/$1$2\n\\xtx $2/g; print; }

开头的新行
  1. 删除所有非单词字符:任何非字母字符,但保留带有变音符号的字符

  2. 将所有内容转换为小写。

  3. 这就是我的基本编程技巧的结束

    我的文本文件如下所示:

    \\xtx

    我的脚本到目前为止产生:

    \txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics  
    \abc More text ...
    

    我想实现:

    \txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics  
    \xtx Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics  
    \abc More text ...
    

    非常感谢任何帮助!

    编辑:
    这是一个真实的示例字符串:

    \txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics  
    \xtx text with symbols and numbers and cháractẽrs with diacrítics  
    \abc More text ...
    

    ......除了以\ txt ...

    开头的行外,一切都应保持原样

3 个答案:

答案 0 :(得分:2)

你可以尝试这种转换

Perl

use strict;
use warnings;

binmode (DATA, ":utf8");
binmode (STDOUT, ":utf8");

while (<DATA>) {
   s/^(\\txt )(.*)/GetConvetedLine($1,$2)/me;
   print; 
}

sub GetConvetedLine
{
    my ($txt, $body) = @_;
    my $newbody = $body;
    $newbody =~ s/[^\pL\s]+//g;
    $newbody =~ s/\s+/ /g;
    $newbody = lc($newbody);
    return $txt . $body . "\n" . "\\xtx " . $newbody;
}


__DATA__    
\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics

输出

\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
\xtx text with symbols and numbers and cháractẽrs with diacrítics

答案 1 :(得分:1)

你可以稍微重构一下,所以你不必把它全部放在一个正则表达式中:

use strict;
use warnings;    
$^I = '.bak';
while (<>) {
    print; 
    if(/^\\txt/) {
        s/^\\txt//; # remove \txt
        s/[\[\]\(\)//g; # remove all unwanted characters
        print "\xtx $_";
    }
    print; 
}

答案 2 :(得分:1)

为了记录,这是我最终使用的sln的答案的(略微修改过的)版本。它的调用方式与我用perl script.pl myfile.txt调用原始脚本的方式相同:

#!/usr/bin/perl

use strict;
use warnings;
use open qw(:std :utf8);
$^I = '.bak'; # create a backup copy 

while (<>) {
   s/^(\\txt )(.*)/GetConvetedLine($1,$2)/me;
   print;
}

sub GetConvetedLine
{
    my ($txt, $body) = @_;
    my $newbody = $body;
    $newbody =~ s/[^\pL\s]+//g;
    $newbody =~ s/ \s+/ /g;
    $newbody = lc($newbody);
    return $txt . $body . "\n" . "\\xtx " . $newbody;
}