Question

我现在处理这段时间已经很久了。我有拉丁字母表中的字符，并希望它们仅以大写字母字符串编码。有没有可以做到这一点的模块？或者我可以修改为只使用uc字符字符的任何BaseX编码？

我目前已经使用正则表达式替换实现了它的一部分，但它只覆盖了几个字符并且绝对不高效:)

无论如何，如果没有办法通过模块或功能处理，有没有办法通过正则表达式来提高效率？

我想到了tr/[\+,\-,...]/[PLUS,MINUS,...]/cds;

但似乎tr只用char替换char而不是chars序列的char :(

任何想法？

Achim的

Answer 1

回答tr问题：

%subs = ( '+' => 'PLUS' );
my $pat = join '|', map quotemeta, keys %subs;
s/($pat)/$subs{$1}/g;

Base 26可以做，但实现起来有点困难和低效，因为26不是2的幂。但它绝对是你想要的。我会看到编码。

与此同时，这是一个基础16解决方案：

sub bytes_to_base16 {
   my $e = unpack('H*', $_);
   $e =~ tr/0123456789ABCDEFabcdef/ABCDEFGHIJKLMNOPKLMNOP/;
   return $e;
}

sub base16_to_bytes {
   my $e = $_[0];
   $e =~ tr/ABCDEFGHIJKLMNOP/0123456789ABCDEF/;
   return pack('H*', $_);
}

让我们看看基数26与基数16的有效性如何：

$ perl -MMath::BigInt -MMath::BigFloat -E'
   my $n = Math::BigInt->new(1);
   my $bs = 0;
   for (1..10) {
      $n <<= 8;
      ++$bs;
      my $bd16 = 2*$bs;
      my $bd26 = Math::BigFloat->new($n)->blog(26, 5)->bceil->numify;
      say sprintf "%2d bytes takes %2d base16 digits or %2d base26 digits.".
                  " base26 is %3.0f%% of the size of base16.",
         $bs, $bd16, $bd26, $bd26/$bd16*100;
      }
'
 1 bytes takes  2 base16 digits or  2 base26 digits. base26 is 100% of the size of base16.
 2 bytes takes  4 base16 digits or  4 base26 digits. base26 is 100% of the size of base16.
 3 bytes takes  6 base16 digits or  6 base26 digits. base26 is 100% of the size of base16.
 4 bytes takes  8 base16 digits or  7 base26 digits. base26 is  88% of the size of base16.
 5 bytes takes 10 base16 digits or  9 base26 digits. base26 is  90% of the size of base16.
 6 bytes takes 12 base16 digits or 11 base26 digits. base26 is  92% of the size of base16.
 7 bytes takes 14 base16 digits or 12 base26 digits. base26 is  86% of the size of base16.
 8 bytes takes 16 base16 digits or 14 base26 digits. base26 is  88% of the size of base16.
 9 bytes takes 18 base16 digits or 16 base26 digits. base26 is  89% of the size of base16.
10 bytes takes 20 base16 digits or 18 base26 digits. base26 is  90% of the size of base16.

有效的实施会产生稍微低效的输出。

$ perl -MMath::BigInt -MMath::BigFloat -E'
   my $bs = 0;
   for (1..10) {
      ++$bs;
      my $bd16 = 2*$bs;
      my $bd26 = int($bs/4)*7 + ($bs%4)*2;
      say sprintf "%2d bytes takes %2d base16 digits or %2d base26 digits.".
                  " base26 is %3.0f%% of the size of base16.",
         $bs, $bd16, $bd26, $bd26/$bd16*100;
      }
'
 1 bytes takes  2 base16 digits or  2 base26 digits. base26 is 100% of the size of base16.
 2 bytes takes  4 base16 digits or  4 base26 digits. base26 is 100% of the size of base16.
 3 bytes takes  6 base16 digits or  6 base26 digits. base26 is 100% of the size of base16.
 4 bytes takes  8 base16 digits or  7 base26 digits. base26 is  88% of the size of base16.
 5 bytes takes 10 base16 digits or  9 base26 digits. base26 is  90% of the size of base16.
 6 bytes takes 12 base16 digits or 11 base26 digits. base26 is  92% of the size of base16.
 7 bytes takes 14 base16 digits or 13 base26 digits. base26 is  93% of the size of base16.
 8 bytes takes 16 base16 digits or 14 base26 digits. base26 is  88% of the size of base16.
 9 bytes takes 18 base16 digits or 16 base26 digits. base26 is  89% of the size of base16.
10 bytes takes 20 base16 digits or 18 base26 digits. base26 is  90% of the size of base16.

请注意，有效实现对7个字节长的输入使用额外的数字。

因此，使用base26而不是base16的努力是否值得？可能不会，除非每个字节真的珍贵。

最后，这是一个基础26实现。

my @syms = ('A'..'Z');
my %syms = map { $syms[$_] => $_ } 0..$#syms;

sub bytes_to_base26 {
   my $e = '';

   my $full_blocks = int(length($_[0]) / 4);
   for (0..$full_blocks-1) {
      my $block = unpack('N', substr($_[0], $_*4, 4));
      $e .= join '', @syms[
         $block / 26**6 % 26,
         $block / 26**5 % 26,
         $block / 26**4 % 26,
         $block / 26**3 % 26,
         $block / 26**2 % 26,
         $block / 26**1 % 26,
         $block / 26**0 % 26,
      ];
   }

   my $extra = substr($_[0], $full_blocks*4);
   for my $block (unpack('C*', $extra)) {
      $e .= join '', @syms[
         $block / 26**1 % 26,
         $block / 26**0 % 26,
      ];
   }

   return $e;
}

sub base26_to_bytes {
   my $d = '';

   my $full_blocks = int(length($_[0]) / 7);
   for (0..$full_blocks-1) {
      my $block = 0;
      $block = $block*26 + $syms{$_} for unpack '(a)*', substr($_[0], $_*7, 7);
      $d .= pack('N', $block);
   }

   my $extra = substr($_[0], $full_blocks*7);
   my @extra = unpack('(a)*', $extra);
   while (@extra) {
      my $block = 0;
      $block = $block*26 + $syms{ shift(@extra) };
      $block = $block*26 + $syms{ shift(@extra) };
      $d .= pack('C', $block);
   }

   return $d;
}

Answer 2

最简单的方法是使用base16编码，正如其他人所建议的那样，并将数字重新映射到字母 - 但是你只使用了26个字符中的16个，这很浪费。

最有效的编码可能是base26，但这将非常困难 - 实际上，您将整个输入视为一个大的二进制数，并将其从基数2转换为基数为26。

log2（26）刚刚超过4.7，所以最多（在没有压缩的情况下）你可以编码每个字母4.7位。较少浪费的编码可能编码7个字母中的4个字节（32位）。 7个字母为您提供大约32.9位信息，因此您不会丢失太多信息。它都可以用32位算法完成。然后，如果输入不是4个字节的倍数，你将不得不决定该怎么做。

（实际的实施留作练习 - 至少现在。）

Answer 3

您可以使用Base32编码，包含26个大写字母和6个数字：

http://pastebin.com/YPvfrpHW

只需将$code数组更改为您要使用的任何字符集。

编辑：糟糕，只是注意到你是Perl而不是PHP，抱歉。您应该能够在CPAN上找到执行相同操作的Base32模块。

编辑2：FWIW，我在CPAN上看到Convert :: Base32，Encode :: Base32和MIME :: Base32。

Answer 4

为了一点乐趣，这是我的Enigma模拟器。没有一种简单的方法可以实现你想要做的事情，因为轮子没有任何转义字符，你发明的代表转义序列的任何序列都会显着降低密码的强度。

然而，8位拉丁输入可以使用65 +（$ Char＆amp; 15）.65 +（$ Char＆gt;＆gt; 4）从0-255映射到[AP] [AP]，并且在输出时反转，但是RZ会被浪费，输入中会有很多漏洞，虽然这可以通过gzip首先解决。

德国人通常用X来表示空格，如果真的有必要拼写标点符号，试图避免拼写同样的东西两次。我知道这很烦人，但事实就是如此。如果我们增加轮子上的字母数量，那么它就不再是Enigma机器了！

#!/usr/bin/perl
#Tinigma 2010 Usage:tinigma.pl 123 rng ini "GHWVYYDVPQGEWQWVT"
($n,$o,$p)=map(ord()-65,split//,uc$ARGV[1]);($z,$y,$x)=map(ord
()-65,split//,uc$ARGV[2]);($l,$m,$r)=map$_-1,split//,$ARGV[0];
$t=uc$ARGV[3];$t=~s/[^A-Z]//g;$b=26;$j=0;@N=qw(7 25 11 6 1);@R
=('EKMFLGDQVZNTOWYHXUSPAIBRCJ'x3,'AJDKSIRUXBLHWTMCQGZNPYFVOE'x
3,'BDFHJLCPRTXVZNYEIWGAKMUSQO'x3,'ESOVPZJAYQUIRHXLNFTGKDCMWB'x
3,'VZBRGITYUPSDNHLXAWMJQOFECK'x3,'YRUHQSLDPXNGOKMIEBFZCWVJAT'x
3);@t=split//,$t;for$v(@R){$i=0;for(split//,$v){$c=ord($_)-65;
$F[$j][$i]=$c;$R[$j][$c+$b*(int($i/$b))]=$i;$i++}$j++}@S=@{$F[
5]};$f=$y==$F[$m][$N[$m]]?1:0;$i=0;for(@t){if($f){$y++;$y%=$b;
$z++;$z%=$b;$f=0}if($x==$F[$r][$N[$r]]){$y++;$y%=$b;if($y==$F[
$m][$N[$m]]){$f=1}}$x++;$x%=$b;$e.=chr(($R[$r][$R[$m][$R[$l][$
S[$F[$l][$F[$m][$F[$r][ord($_)-39+$x-$n]-$x+$n+$y-$o]-$y+$o+$z
-$p]-$z+$p]+$z-$p]-$z+$p+$y-$o]-$y+$o+$x-$n]-$x+$n)%$b+65)}
print"$e\n"

Answer 5

Keith Thompson和jrockway已经简要提及了此解决方案。
在这里，我们对其进行研究并实现。

问题很简单，只要您知道：

任何文件（二进制或文本）都可以看作一个大数字。
将该文件的字节视为2 ⁸ = 256个基数的数字。
任何数字都可以转换为N≥2的自然基数。
可以自由选择以N为底的数字的N个数字。
通常我们使用0，1，2，…，但是也可以使用A，B，C，…甚至{{ 1}}，?，?，…。

因此，一种仅使用?-A来编码（文本）文件的方法是：

将文件读为一个大数字F。
使用数字Z-A在基数26中打印F。

这是一个实现：

这将打印编码#! /usr/bin/env perl use strict; use warnings; use Math::BigInt try => 'GMP'; our $plaintextDigits = join('', map(chr, 0..255)); our $codeDigits = join('', 'A'..'Z'); sub baseConversion { my ($str, $inDigits, $outDigits) = @_; return Math::BigInt ->from_base($str, length $inDigits, $inDigits) ->to_base(length $outDigits, $outDigits); } sub encode { return baseConversion shift, $main::plaintextDigits, $main::codeDigits; } sub decode { return baseConversion shift, $main::codeDigits, $main::plaintextDigits; } my $input = 'String to be encoded. Or use `shift` to read an CLI argument.'; print "input:\n$input\n"; my $encoded = encode $input; print "\nencoded:\n$encoded\n"; my $decoded = decode $encoded; print "\ndecoded:\n$decoded\n";，然后将其正确解码。

好处：

编码后的文本尽可能节省空间。获得较短编码的唯一方法是在对输入进行编码之前先对其进行压缩。
理论上，编码和解码非常简单，因为它们只是基本转换。

缺点：

转换效率不是很高
- 文件必须适合内存。
- 转换的时间复杂度在输入大小上是超线性的。由于26不是2的幂，我们必须为每个输出数字除以输入数字（具有文件的大小！）。
  但是，实际上这可能是可以接受的，因为我怀疑您只处理较短的字符串。在我的系统（i5-4570，3.2 GHz）上，上面未优化的实现立即进行了1 kB的编码和解码。没有GMP，10 kB需要11秒。使用GMP 100 kB需要10秒。

实施说明：

为简单起见，我对输入和perl的内部字符串表示形式做了一些假设（系统之间可能有所不同）。对于更强大的解决方案，您应该在输入和ESQEKWWQLSBQHVKBCAQYKLXMVQRUFOOMPJGFTADLYTDQLFGTRTLWJBYTJICKUOFUVPHSHZHCRZKFMVSHRHCACZFUWTXVXUDRVKMIAIKK上使用perl的utf8::encode / decode。
our $...Digits和from_base可能更有效地实现，因为默认实现可能不知道数字to_base是连续的。

Answer 6

Base64编码生成十六进制输出，表示16个可能的字符。因为字母表有26，所以你可以用数字交换数字。然后你将只使用16个字母的字母，但是你有一个只包含字母字母的字符串，它很容易编码解码并返回原始字符串。这是一个奇怪的问题（它看起来像家庭作业），但它会做的伎俩。

Answer 7

你已经指出了一个非常有损的翻译......这可能并不令人满意。

但是：


    #!/usr/bin/perl  
    use strict;  
    use warnings;  
    use 5.012;  

    # 7095527/perl-how-to-encode-and-decode-characters-in-uppercase-alpha-letters-only  

    my $string = "abcDEFghijklMNO1234567890pqr+_)!@#}{?";  
    my @arr = split //, uc($string);  

    my (@intermediate, $char);  
    for my $char(@arr) {  
        if ($char =~ /[A-Z]/) {  
            say "ENIGMA char found (possibly uc'ed): $char";  
        } else {  
            say "WTF? \$char at line17 is !~ /[A-Z]/: $char";  
            next;  
        }  
    }  

    =head OUTPUT:  

    > SO7095527.pl  
    ENIGMA char found (possibly uc'ed): A
    ENIGMA char found (possibly uc'ed): B
    ENIGMA char found (possibly uc'ed): C
    ENIGMA char found (possibly uc'ed): D
    ENIGMA char found (possibly uc'ed): E
    ENIGMA char found (possibly uc'ed): F
    ENIGMA char found (possibly uc'ed): G
    ENIGMA char found (possibly uc'ed): H
    ENIGMA char found (possibly uc'ed): I
    ENIGMA char found (possibly uc'ed): J
    ENIGMA char found (possibly uc'ed): K
    ENIGMA char found (possibly uc'ed): L
    ENIGMA char found (possibly uc'ed): M
    ENIGMA char found (possibly uc'ed): N
    ENIGMA char found (possibly uc'ed): O
    WTF? $char at line17 is !~ /[A-Z]/: 1
    WTF? $char at line17 is !~ /[A-Z]/: 2
    WTF? $char at line17 is !~ /[A-Z]/: 3
    WTF? $char at line17 is !~ /[A-Z]/: 4
    WTF? $char at line17 is !~ /[A-Z]/: 5
    WTF? $char at line17 is !~ /[A-Z]/: 6
    WTF? $char at line17 is !~ /[A-Z]/: 7
    WTF? $char at line17 is !~ /[A-Z]/: 8
    WTF? $char at line17 is !~ /[A-Z]/: 9
    WTF? $char at line17 is !~ /[A-Z]/: 0
    ENIGMA char found (possibly uc'ed): P
    ENIGMA char found (possibly uc'ed): Q
    ENIGMA char found (possibly uc'ed): R
    WTF? $char at line17 is !~ /[A-Z]/: +
    WTF? $char at line17 is !~ /[A-Z]/: _
    WTF? $char at line17 is !~ /[A-Z]/: )
    ...

    =cut

请注意，如果消息指定加油位置为“73N 39W”，则潜艇艇长会变得无用...

Perl：如何仅使用大写字母对字符进行编码和解码

7 个答案: