我最近创建了一个Perl脚本,用以下代码搜索以D和E开头的单词:
$infile = 'words.txt';
open(IN, $infile);
$count = 0;
while ($word = <IN>) {
chomp($word);
if ($word =~ /^d\w*e$/i) {
print "$word\n";
$count++;
}
}
print "$count\n";
我最近决定分叉代码并创建一个脚本,搜索一个单词,该单词是六个字母,单词中的字母是按字母顺序排列的(A到Z)。我没有使用 words.txt ,而是计划使用位于 usr / share / dict / words 的Unix标准字典。如何通过修改此代码来实现此目的?
答案 0 :(得分:9)
看起来你真正需要的是一种算法,用于检查给定单词中的字母是否按字母顺序排列。有几种方法,但是这个子程序的工作原理是将单词拆分为其组成字符列表,对列表进行排序并重新组合。如果结果与原始单词匹配,则该单词已经排序。
use strict;
use warnings;
use feature 'fc';
for (qw/ a ab ba cab alt effort toffee /) {
print "$_\n" if in_alpha_order($_);
}
sub in_alpha_order {
my $word = fc(shift);
my $new = join '', sort $word =~ /./g;
return $new eq $word;
}
<强>输出强>
a
ab
alt
effort
如果您真的想在正则表达式中执行此操作,则可以构建类似
的替换a(?=[a-z]) | b(?=[b-z]) | c(?=[c-z]) ...
这是一个以这种方式工作的程序。它的输出与上面的输出相同。
use strict;
use warnings;
my $regex = join '|', map "${_}(?=[$_-z])", 'a'..'z';
$regex = qr/^(?:$regex)*.$/i;
for (qw/ a ab ba cab alt effort toffee /) {
print "$_\n" if $_ =~ $regex;
}
答案 1 :(得分:2)
支持非ASCII字词:
#!/usr/bin/perl
use strict;
use warnings;
use encoding 'utf8'; # Upgrade byte strings using UTF-8
use Unicode::Collate; # To sort letters alphabetically
use constant NCHARS => 6; # Consider only words with NCHARS characters in them
my $filename = '/usr/share/dict/words';
open (my $fh, '<:encoding(UTF-8)', $filename)
or die "can't open '$filename' $!";
my $collator = Unicode::Collate::->new();
while (my $word = <$fh>) {
chomp $word;
my @chars = ($word =~ /\X/g); # Split word into characters
# Print words with given length that have characters in alphabetical order
print "$word\n" if (@chars == NCHARS &&
join('', $collator->sort(@chars)) eq $word);
}
close $fh;
答案 2 :(得分:1)
这是一个选项:
#!/usr/bin/env perl
use warnings;
use strict;
my $wordsFile = '/usr/share/dict/words';
my $count = 0;
open my $fh, '<', $wordsFile or die $!;
while ( my $word = <$fh> ) {
chomp $word;
next unless length $word == 6;
my $sorted = join '', sort split //, lc $word;
if ( $sorted eq lc $word ) {
print "$word\n";
$count++;
}
}
close $fh;
print "$count\n";
这个split
是将字母按字母顺序排列的原始单词。这些字母重新join
以形成一个新单词。然后对原始单词进行比较。如果它们是相同的,则会打印并计算。
答案 3 :(得分:1)
我有一个类似于Kenosis和Borodin的解决方案,但是你需要注意案例。 Perl的默认排序函数将所有大写字母放在小写字母之前。我的下面的版本负责这个。
#!/usr/bin/env perl
use strict;
use warnings;
sub is_six_letter_word {
my $word = shift;
return length($word) == 6;
}
sub is_letters_in_alphabetical_order {
my $word = shift;
$word = fc($word);
my @chars = split("", $word);
my $sorted_word = join("", sort(@chars));
return $word eq $sorted_word;
}
open(my $fh_in, $ARGV[0]) or die "Error opening input file";
my $word = undef;
while ($word = <$fh_in>) {
chomp($word);
if (is_six_letter_word($word) && is_letters_in_alphabetical_order($word)) {
printf("%s\n", $word);
}
}
close($fh_in);