Question

我在文本清理/规范化过程中需要一些帮助

我来到了需要转换货币格式的地方

输入：1亿美元产出：1亿美元

输入：2000万欧元产值：2000万欧元

我正在使用perl正则表达式进行清理过程，如果有人可以帮我提供正则表达式以将输入转换为输出，将会有所帮助

到目前为止这是我的代码

s/([\$])([0-9\.])([million])/ $2 $3 dollars/g;

示例数字为420万美元

这是我尝试将美元符号转换为单词“dollar”并将其转换为短语结尾，但它没有按预期提供结果，它为我提供了“.2百万”作为输出

Answer 1

正则表达式中的

[...]引入了一个字符类，因此[million]与[nolim]相同，并且匹配这些字符的一个。

我会为哈希中的货币创建一个转换表。从哈希的键中，您可以构建与它们匹配的正则表达式，并在替换中使用它：

#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use feature qw{ say };

my %currency = ( '$' => 'dollar',  # or dollars?
                 eur => 'euros',
                 '€' => 'euros',
);

my $regex = join '|', map quotemeta, keys %currency;

for my $input ('$100 million', 'eur20 million', '€13.2 thousand') {
    ( my $output = $input )
        =~ s/($regex)([0-9.]+ (?:million|thousand))/$2 $currency{$1}/g;
    say $output;
}

Answer 2

你的正则表达式不给出你声称它的结果。

s/([\$])([0-9.])([million])/ $2 $3 dollars/g;

在/x修饰符的帮助下，我们可以在模式中添加空格（甚至是换行符和注释）以提高可读性。然后可以将您的模式重写为

s/([\$])        # match a literal $ and capture that as $1
  ([0-9.])      # match ONE digit or a dot and capture as $2
  ([million])   # match ONE character of 'm', 'i', 'l', 'o', 'n'
                # and capture as $3
 / $2 $3 dollars/gx;

$100 million无法匹配此模式并导致.2 million。可能的输入是 $3i，$.o或$9m。他们会提供3 i dollars，. o dollars和9 m dollars。

您正在寻找的是这样的模式：

s/\$        # a literal '$'
  ([\d.]+)  # one or more digits or dots, like e.g. '99.5',
            # captured as $1
  \s+       # one or more whitespace
  (million) # the literal text 'million', captured as $2
  /$1 $2 dollars/gx;

（或者，作为一个单行：s/\$([\d.]+)\s+(million)/$1 $2 dollars/g;）

请注意，$2在这种情况下始终为million，您也可以将其重写为s/\$([\d.]+)\s+million/$1 million dollars/g;（省略()周围{{} 1}}）。

perl regex转换货币

2 个答案: