## 3602 Example Page
### Title
'Example Page' => 'Página de ejemplo',
### Body
'This is an example of a string that came from an example page.' => 'Este es un ejemplo de una cadena que proviene de una página de ejemplo.',
'Parsing this would be relatively simple, except that
there are carriage returns thrown into the text without warning.' => 'Parsear esto sería relativamente simple, excepto que
hay retornos de carro lanzados en el texto sin previo aviso.',
### Extended
## 3704 About Us
### Title
'About Us' => 'Sobre nosotros',
### Body
'This text takes the place of text which would identify the client.' => 'Este texto toma el lugar del texto que identificaría al cliente.',
q{I passed the English text though Google Translate. Don't think for a moment that these passages are professionally translated!} => q{Pasé el texto en inglés a través de Google Translate. ¡No piense por un momento que estos pasajes son traducidos profesionalmente!},
### Extended
我要做的是编写一个Perl脚本来解析该文件,在CMS中找到该页面,然后用翻译后的字符串替换原始的英语字符串,然后将该页面保存在CMS中以供后续发布。 / p>
我正在使用的CMS具有Perl API,因此整个脚本都是用Perl编写的。
while (defined($current_line = <FILE>))
chomp $current_line;
# We need to parse the file, line-by-line, to determine what each line represents.
# If the $current_phrase is populated at the beginning of the case statement,
# we know that the
# When we start parsing, $current_page_id is zero (0). If we hit a page selector and
# the page ID is something other than zero, we need to save the previous page.
if (length($current_phrase) > 0) {
if ($current_line =~ /(.*\')\s=>\'(.*)/) {
$current_phrase .= '\n' . $1;
elsif ($current_line =~ qr/##\s(\d+)\s.+/mp) {
# $1 is the page ID number.
if ($current_page_id != int($1)) {
print "\nPage $1 selector\n";
$current_page_id = int($1);
$current_page_change_count = 0;
$current_page_section_name = '';
$current_page_section_content = '';
$current_phrase = '';
} elsif ($current_line =~ qr/###\s(.+)/mp) {
# $1 is the name of the page section.
# We have to figure out if the page section is the same as the one that we
# have been processing.
print "\nPage Section Delimiter: " . $1 . "\n";
if ($1 ne $current_page_section_name) {
# Since $1 is not $current_page_section_name, we need to put
# $current_page_section_content into the page section where it belongs.
# $current_page_section_name refers to the section of the page with changes.
$current_page_section_name = $1;
} elsif (($current_line =~ qr/'((?:(?>[^'\\]*)|\\.)*)' => '((?:(?>[^'\\]*)|\\.)*)',/mp) || ($current_line =~ qr/q\{((?:(?>[^}\\]*)|\\.*))} => q\{((?:(?>[^}\\]*)|\\.*))},/mp)){
# The complex regular expression above is intended to capture multi-line
# variants of either the 'phrase' or q{phrase} pattern.
# See https://stackoverflow.com/questions/23086883/perl-multiline-string-regex
# for some idea how the single quote pattern was found. We had to work up the
# q{phrase} pattern ourselves.
print "Phrase " . $current_page_change_count . ", original: " . $1 . ", change to: " . $2 . "\n\n";
} elsif (($current_line =~ qr/^\s+?\'(.+)[^\'],?\s?/mp) || ($current_line =~ qr/^\s+?q\{(.+)[^}],?\s?/mp)) {
# The biggest unresolved issue with the while loop is how
# to identify the unterminated strings that begin with
# a single quote or the q{ construct.
# The regular expression above is an attempt to match both cases.
# Eventually, I will have to search for the end of the
# string, the => construct, and the translated phrase.
print "Unterminated string: " . $current_line . "\n";
} elsif (($current_line =~ qr/^\s+/mp) || (length($current_line) == 0)) {
print "Blank line.\n";
} else {
# Want to ignore, not print this.
print "Something else: \'" . $current_line . "\'\n";
print "\nTotal lines: " . $total_lines . "\n";
print "\nTotal blank lines: " . $total_blank_lines . "\n";
print "Total change count: " . $total_change_count . "\n";
答案 0 :(得分:2)
您的输入具有Perl样式的#comments,Perl样式的胖逗号(用于关联英语和外国文字),甚至是Perl q{}
@sections = split /^(\s*#[^\n]*)/m, $INPUT; # $INPUT is the whole file
foreach $section (@sections) {
next unless $section =~ /\S/;
if ($section =~ /^\s*##\s(\d+)\s.+/) {
$page_number = $1;
} elsif ($section =~ /^\s*###\s(.+)/) {
$page_section = $1;
} elsif ($section =~ /=>/) {
%phrases = eval( "($section)" );
# manipulate keys and values of phrases
{"source":"en-US", "dest":"es-ES",
[{"pageTitle":"Example Page", "pageNumber":3602,
"sections":[{"sectionName":"Title", "phrases":{
"Example Page":"Página de ejemplo"}},
"This is an example of a string that came from an example page.":
"Este es un ejemplo de una cadena que proviene de una página de ejemplo.",
... }}]]}