我是正则表达式的新手,我很确定这个问题已在某处得到解答,但我没有成功调整我发现的工作。我正在使用带有重复词条的字典文件,这会导致编译器失败。所以我需要在一行的开头匹配精确的头部单词(所有这些单词不包含诸如“[”和“<”之类的字符)并删除重复。但是文件中有很多很多重复的单词,所以我想自动替换匹配。这是字典中的一个例子:
aGga
<© aGga @>
[m1]aṅgá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim aṅga,[/b] how much more?[/trn][/m]
aGga
<© aGga @>
[m1]áṅga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
在这里,我需要匹配相同的头部单词(“aGga”),然后删除第二个,第三个等实例(第二个“aGga”)以及它们的后续行(恰好发生在&lt;和&gt; [“&lt;©aGga @&gt;”],产生所需的输出:
aGga
<© aGga @>
[m1]aṅgá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim aṅga,[/b] how much more?[/trn][/m]
[m1]áṅga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
我已经看过3个词条实例,所以我需要寻找的不仅仅是重复任何给定的词条。
到目前为止我的尝试(例如基于this question的“^(。+?\ s)”)只是在匹配相同的词条时返回太多。我主要使用Sublime Text中的正则表达式查找和替换函数,但很乐意以任何可能的方式执行此操作。我知道这对于正则表达式专家来说可能非常简单和无聊,所以感谢您花时间帮助新手。
答案 0 :(得分:4)
perl的一种方式:
my $data = 'aGga
<© aGga @>
[m1]aṅgá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim aṅga,[/b] how much more?
aGga
<© aGga @>
[m1]áṅga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i]
aGga
<© aGga @>
[m2][trn][i]m. pl. No of a people and their country.[/i]
gubo
<© gubo @>
kjhkjhkj hkjhk jhk kjhkjh khk hkjh kj hkj';
$data =~ s/^
(?|
\G(?!\A) ([^[<\s]+) \R <©\ \1\ @> # contigous
|
([^[<\s]+) \R <©\ \1\ @> \K # new item
)
( (?>\R.+)* ) # block: group 2
(?: \R\R (?= \1 \R <©[^>]+@> $ ) )?
/$2/gmx;
print $data;
答案 1 :(得分:3)
编辑:utf8的一些打开/关闭内容
# Open a temp file for writing as utf8
# Output to this file will be automatically encoded from Perl internal to utf8 octets
# Write the internal string
# Check the file with a utf8 editor
# ----------------------------------------------
open (my $out, '>:utf8', 'temp.txt') or die "can't open temp.txt for writing $!";
print $out $internal_string_1;
close $out;
# Open the temp file for readin as utf8
# All input from this file will be automatically decoded as utf8 octets to Perl internal
# Read/decode to a different internal string
# ----------------------------------------------
open (my $in, '<:utf8', 'temp.txt') or die "can't open temp.txt for reading $!";
$/ = undef;
my $internal_string_2 = <$in>;
close $in;
抱歉这么久。
这是一种方式,它使用带回调的全局替换
为此,块必须是顺序的。
如果块不是顺序的,则必须扩展解决方案。
# /((?<=^)\s*)^([^<\[\n]+?)(\s*\n\s*<.*>.*(?:\n|$))/
( # (1 start), Ws trim
(?<= ^ )
\s*
) # (1 end)
^ # BOL
( [^<\[\n]+? ) # (2), Head
( # (3 start), Angle head
\s* \n \s* < .* > .*
(?: \n | $ ) # Newline or EOL
) # (3 end)
Perl示例:
use strict;
use warnings;
$/ = undef;
#my $filehandle = open(..);
#my $data = <$filehandle>;
my $data = <DATA>;
my $lasthead = "";
sub StripDupHead
{
my ($wstrim, $head, $angle_head ) = @_;
if ( $head eq $lasthead ) {
return "";
}
$lasthead = $head;
return $wstrim . $head . $angle_head;
}
$data =~ s/((?<=^)\s*)^([^<\[\r\n]+?)(\s*\r?\n\s*<.*>.*(?:\r?\n|$))/StripDupHead($1,$2,$3)/emg;
print $data, "\n";
# print $filehandle $data, "\n";
# close ($filehandle);
__DATA__
aGga
<© aGga @>
[m1]a?gá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim a?ga,[/b] how much more?[/trn][/m]
aGga
<© aGga @>
[m1]á?ga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
bGgb
<© bGgb @>
[m1]a?gá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim a?ga,[/b] how much more?[/trn][/m]
cGgc
<© cGgc @>
[m1]á?ga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
cGgc
<© cGgc @>
[m1]a?gá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim a?ga,[/b] how much more?[/trn][/m]
cGgc
<© cGgc @>
[m1]á?ga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
输出:
aGga
<© aGga @>
[m1]a?gá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim a?ga,[/b] how much more?[/trn][/m]
[m1]á?ga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
bGgb
<© bGgb @>
[m1]a?gá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim a?ga,[/b] how much more?[/trn][/m]
cGgc
<© cGgc @>
[m1]á?ga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]
[m1]a?gá [/m]
[m2][trn][i]pel. ¤1.¤ emphatic[/i]: just, only; especially; ¤2,¤ [i]exhortative[/i]: [i]w. voc. or impv.[/i]; ¤3.¤ [i]intr.[/i]: [/trn][/m]
[m2][trn][b]kim a?ga,[/b] how much more?[/trn][/m]
[m1]á?ga [/m]
[m2][trn][i]m. pl. No of a people and their country.[/i][/trn][/m]