我正在运行一个简单的Perl脚本,该脚本会复制从allocateDirect()
到\txt
开头的所有行。到现在为止还挺好。
\xtx
现在我想"擦洗"所有以use strict;
use warnings;
$^I = '.bak';
while ( <> ) {
s/(\\txt )(.*)/$1$2\n\\xtx $2/g;
print;
}
和
删除所有非单词字符:任何非字母字符,但保留带有变音符号的字符
将所有内容转换为小写。
这就是我的基本编程技巧的结束
我的文本文件如下所示:
\\xtx
我的脚本到目前为止产生:
\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
\abc More text ...
我想实现:
\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
\xtx Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
\abc More text ...
非常感谢任何帮助!
编辑:
这是一个真实的示例字符串:
\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
\xtx text with symbols and numbers and cháractẽrs with diacrítics
\abc More text ...
......除了以\ txt ...
开头的行外,一切都应保持原样答案 0 :(得分:2)
你可以尝试这种转换
Perl
use strict;
use warnings;
binmode (DATA, ":utf8");
binmode (STDOUT, ":utf8");
while (<DATA>) {
s/^(\\txt )(.*)/GetConvetedLine($1,$2)/me;
print;
}
sub GetConvetedLine
{
my ($txt, $body) = @_;
my $newbody = $body;
$newbody =~ s/[^\pL\s]+//g;
$newbody =~ s/\s+/ /g;
$newbody = lc($newbody);
return $txt . $body . "\n" . "\\xtx " . $newbody;
}
__DATA__
\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
输出
\txt Text (.) with [ symbols and Num[bers (.2) and cháractẽrs with diacrítics
\xtx text with symbols and numbers and cháractẽrs with diacrítics
答案 1 :(得分:1)
你可以稍微重构一下,所以你不必把它全部放在一个正则表达式中:
use strict;
use warnings;
$^I = '.bak';
while (<>) {
print;
if(/^\\txt/) {
s/^\\txt//; # remove \txt
s/[\[\]\(\)//g; # remove all unwanted characters
print "\xtx $_";
}
print;
}
答案 2 :(得分:1)
为了记录,这是我最终使用的sln的答案的(略微修改过的)版本。它的调用方式与我用perl script.pl myfile.txt
调用原始脚本的方式相同:
#!/usr/bin/perl
use strict;
use warnings;
use open qw(:std :utf8);
$^I = '.bak'; # create a backup copy
while (<>) {
s/^(\\txt )(.*)/GetConvetedLine($1,$2)/me;
print;
}
sub GetConvetedLine
{
my ($txt, $body) = @_;
my $newbody = $body;
$newbody =~ s/[^\pL\s]+//g;
$newbody =~ s/ \s+/ /g;
$newbody = lc($newbody);
return $txt . $body . "\n" . "\\xtx " . $newbody;
}