Question

我有字符串：

^[U0422^Z ^[U041D^Z^[U0410^Z ^[U0412^Z^[U042B^Z^[U0417^Z === Т НА ВЫЗ

等。我想对该字符串执行sed，以替换代码^ [Uxxxx ^ Z

如果sed只接受2位十六进制代码，我怎么能这样做？我有3 GB数据，字符编码如下...我需要在脚本中执行此操作，因为我有多个文件和152个字符要解码...

Answer 1

您可以使用perl，例如：

<强> file.txt的：

Żelazna ręka Marsa - J^[U00F8^Zrstad, Jarl. ^[U0422^Z ^[U041D^Z^[U0410^Z ^[U0412^Z^[U042B^Z^[U0417^Z

<强> script.pl

#!/usr/bin/perl

open my $in,  '<:encoding(UTF-8)', $ARGV[0] or die $!;
open my $out, '>:encoding(UTF-8)', $ARGV[1] or die $!;

while (<$in>) {
    $_ =~ s/\^\[U([0-9A-Fa-f]{4})\^Z/sprintf "%c", hex($1)/ge;
    print $out $_; 
}

close $in;
close $out;

语法为./script.pl <input> <output>。

<强>输出：

$ ./script.pl
Żelazna ręka Marsa - Jørstad, Jarl. Т НА ВЫЗ

递归版：

#!/usr/bin/perl

use strict;
use warnings;
use File::Find;

my @files = <*.txt>;
 for my $file (@files) {

  open my $in,  '<:encoding(UTF-8)', $file or die $!;
  open my $out, '>:encoding(UTF-8)', $ARGV[0] . "_" . $file or die $!;

  while (<$in>) {
    $_ =~ s/\^\[U([0-9A-Fa-f]{4})\^Z/sprintf "%c", hex($1)/ge;
    print $out $_; 
 }
close $in;
close $out;
}

语法为./script.pl <prefix>。如果找到data.txt，则新文件将为prefix_data.txt。

bash用4位十六进制代码替换字符串

1 个答案: