在PHP中将单词转换为数字

时间:2009-07-03 02:55:25

标签: php word numbers

我试图将写成单词的数值转换为整数。例如, “iPhone拥有二十三万七千八百三十个应用” 会成为 “iPhone为230783应用程序”

在开始编码之前,我想知道此转换是否存在任何函数/代码。

7 个答案:

答案 0 :(得分:21)

有很多网页讨论从数字到单词的转换。反方向并没有那么多。我能找到的最好的是Ask Yahoo上的一些伪代码。有关一个不错的算法,请参阅http://answers.yahoo.com/question/index?qid=20090216103754AAONnDz

  

嗯,总的来说,你做了两件事:找到令牌(翻译成数字的单词)和应用语法。简而言之,您正在为非常有限的语言构建解析器。

     

您需要的令牌是:

     

力量:千万,百万,亿   百家:百   十:二十,三十......九十   单位:一,二,三,......九,
  特别:十,十一,十二,......十九。

     

(删除任何“和”,因为它们毫无意义。将连字符分成两个标记。这是六十五个应该被处理为“六十”“五”)

     

一旦你对你的字符串进行了标记,就从右移到左边。

     
      
  1. 从RIGHT抓住所有代币,直到你击中POWER或整个字符串。

  2.   
  3. 在这些模式的停止点之后解析标记:

         

    SPECIAL
      TEN
      UNIT
      十单元
      UNIT HUNDRED
      单位数百特   单位数十万   UNIT HUNDRED UNIT
      UNIT HUNDRED TEN UNIT

         

    (这假设在这个语法中不允许“1700”)

         

    这会为您提供号码的最后三位数字。

  4.   
  5. 如果你停在整个字符串上,你就完成了。

  6.   
  7. 如果您停止通电,请在步骤1重新开始,直到达到更高的POWER或整个琴弦。

  8.   

答案 1 :(得分:20)

老问题,但对于其他遇到此问题的人,我今天必须写出一个解决方案。以下采用与John Kugelman描述的算法模糊相似的方法,但不适用于严格的语法;因此,它将允许一些奇怪的排序,例如“十万和一百万”仍将产生与“一百十万”(1,100,000)相同的数字。无效位(例如拼写错误的数字)将被忽略,因此将无效字符串的输出视为未定义。

根据user132513对joebert答案的评论,我使用Pear的Number_Words生成测试系列。以下代码在0到5,000,000之间的数字上获得100%得分,然后在0到10,000,000之间的100,000个数字的随机样本上获得100%(在整个100亿个系列中运行需要很长时间)。

/**
 * Convert a string such as "one hundred thousand" to 100000.00.
 *
 * @param string $data The numeric string.
 *
 * @return float or false on error
 */
function wordsToNumber($data) {
    // Replace all number words with an equivalent numeric value
    $data = strtr(
        $data,
        array(
            'zero'      => '0',
            'a'         => '1',
            'one'       => '1',
            'two'       => '2',
            'three'     => '3',
            'four'      => '4',
            'five'      => '5',
            'six'       => '6',
            'seven'     => '7',
            'eight'     => '8',
            'nine'      => '9',
            'ten'       => '10',
            'eleven'    => '11',
            'twelve'    => '12',
            'thirteen'  => '13',
            'fourteen'  => '14',
            'fifteen'   => '15',
            'sixteen'   => '16',
            'seventeen' => '17',
            'eighteen'  => '18',
            'nineteen'  => '19',
            'twenty'    => '20',
            'thirty'    => '30',
            'forty'     => '40',
            'fourty'    => '40', // common misspelling
            'fifty'     => '50',
            'sixty'     => '60',
            'seventy'   => '70',
            'eighty'    => '80',
            'ninety'    => '90',
            'hundred'   => '100',
            'thousand'  => '1000',
            'million'   => '1000000',
            'billion'   => '1000000000',
            'and'       => '',
        )
    );

    // Coerce all tokens to numbers
    $parts = array_map(
        function ($val) {
            return floatval($val);
        },
        preg_split('/[\s-]+/', $data)
    );

    $stack = new SplStack; // Current work stack
    $sum   = 0; // Running total
    $last  = null;

    foreach ($parts as $part) {
        if (!$stack->isEmpty()) {
            // We're part way through a phrase
            if ($stack->top() > $part) {
                // Decreasing step, e.g. from hundreds to ones
                if ($last >= 1000) {
                    // If we drop from more than 1000 then we've finished the phrase
                    $sum += $stack->pop();
                    // This is the first element of a new phrase
                    $stack->push($part);
                } else {
                    // Drop down from less than 1000, just addition
                    // e.g. "seventy one" -> "70 1" -> "70 + 1"
                    $stack->push($stack->pop() + $part);
                }
            } else {
                // Increasing step, e.g ones to hundreds
                $stack->push($stack->pop() * $part);
            }
        } else {
            // This is the first element of a new phrase
            $stack->push($part);
        }

        // Store the last processed part
        $last = $part;
    }

    return $sum + $stack->pop();
}

答案 2 :(得分:4)

我没有对此进行太广泛的测试,我或多或少只是在我看到输出中的预期,但它似乎工作,并从左到右解析。

<?php

$str = 'twelve billion people know iPhone has two hundred and thirty thousand, seven hundred and eighty-three apps as well as over one million units sold';

function strlen_sort($a, $b)
{
    if(strlen($a) > strlen($b))
    {
        return -1;
    }
    else if(strlen($a) < strlen($b))
    {
        return 1;
    }
    return 0;
}

$keys = array(
    'one' => '1', 'two' => '2', 'three' => '3', 'four' => '4', 'five' => '5', 'six' => '6', 'seven' => '7', 'eight' => '8', 'nine' => '9',
    'ten' => '10', 'eleven' => '11', 'twelve' => '12', 'thirteen' => '13', 'fourteen' => '14', 'fifteen' => '15', 'sixteen' => '16', 'seventeen' => '17', 'eighteen' => '18', 'nineteen' => '19',
    'twenty' => '20', 'thirty' => '30', 'forty' => '40', 'fifty' => '50', 'sixty' => '60', 'seventy' => '70', 'eighty' => '80', 'ninety' => '90',
    'hundred' => '100', 'thousand' => '1000', 'million' => '1000000', 'billion' => '1000000000'
);


preg_match_all('#((?:^|and|,| |-)*(\b' . implode('\b|\b', array_keys($keys)) . '\b))+#i', $str, $tokens);
//print_r($tokens); exit;
$tokens = $tokens[0];
usort($tokens, 'strlen_sort');

foreach($tokens as $token)
{
    $token = trim(strtolower($token));
    preg_match_all('#(?:(?:and|,| |-)*\b' . implode('\b|\b', array_keys($keys)) . '\b)+#', $token, $words);
    $words = $words[0];
    //print_r($words);
    $num = '0'; $total = 0;
    foreach($words as $word)
    {
        $word = trim($word);
        $val = $keys[$word];
        //echo "$val\n";
        if(bccomp($val, 100) == -1)
        {
            $num = bcadd($num, $val);
            continue;
        }
        else if(bccomp($val, 100) == 0)
        {
            $num = bcmul($num, $val);
            continue;
        }
        $num = bcmul($num, $val);
        $total = bcadd($total, $num);
        $num = '0';
    }
    $total = bcadd($total, $num);
    echo "$total:$token\n";
    $str = preg_replace("#\b$token\b#i", number_format($total), $str);
}
echo "\n$str\n";

?>

答案 3 :(得分:2)

稍微更新El Yobo的答案,现在可以在(几乎)任何包含数字的字符串上运行wordsToNumber函数。见下面的测试。

<?php

class Converter {

    /**
     * Convert numerals to digits
     * @param string $input
     *
     * @return string
     */
    public static function wordsToNumber(string $input)
    {
        static $delims = " \-,.!?:;\\/&\(\)\[\]";
        static $tokens = [
            'zero'        => ['val' => '0', 'power' => 1],
            'a'           => ['val' => '1', 'power' => 1],
            'first'       => ['val' => '1', 'suffix' => 'st', 'power' => 1],
            'one'         => ['val' => '1', 'power' => 1],
            'second'      => ['val' => '2', 'suffix' => 'nd', 'power' => 1],
            'two'         => ['val' => '2', 'power' => 1],
            'third'       => ['val' => '3', 'suffix' => 'rd', 'power' => 1],
            'three'       => ['val' => '3', 'power' => 1],
            'fourth'      => ['val' => '4', 'suffix' => 'th', 'power' => 1],
            'four'        => ['val' => '4', 'power' => 1],
            'fifth'       => ['val' => '5', 'suffix' => 'th', 'power' => 1],
            'five'        => ['val' => '5', 'power' => 1],
            'sixth'       => ['val' => '6', 'suffix' => 'th', 'power' => 1],
            'six'         => ['val' => '6', 'power' => 1],
            'seventh'     => ['val' => '7', 'suffix' => 'th', 'power' => 1],
            'seven'       => ['val' => '7', 'power' => 1],
            'eighth'      => ['val' => '8', 'suffix' => 'th', 'power' => 1],
            'eight'       => ['val' => '8', 'power' => 1],
            'ninth'       => ['val' => '9', 'suffix' => 'th', 'power' => 1],
            'nine'        => ['val' => '9', 'power' => 1],
            'tenth'       => ['val' => '10', 'suffix' => 'th', 'power' => 1],
            'ten'         => ['val' => '10', 'power' => 10],
            'eleventh'    => ['val' => '11', 'suffix' => 'th', 'power' => 10],
            'eleven'      => ['val' => '11', 'power' => 10],
            'twelveth'    => ['val' => '12', 'suffix' => 'th', 'power' => 10],
            'twelfth'    => ['val' => '12', 'suffix' => 'th', 'power' => 10],
            'twelve'      => ['val' => '12', 'power' => 10],
            'thirteenth'  => ['val' => '13', 'suffix' => 'th', 'power' => 10],
            'thirteen'    => ['val' => '13', 'power' => 10],
            'fourteenth'  => ['val' => '14', 'suffix' => 'th', 'power' => 10],
            'fourteen'    => ['val' => '14', 'power' => 10],
            'fifteenth'   => ['val' => '15', 'suffix' => 'th', 'power' => 10],
            'fifteen'     => ['val' => '15', 'power' => 10],
            'sixteenth'   => ['val' => '16', 'suffix' => 'th', 'power' => 10],
            'sixteen'     => ['val' => '16', 'power' => 10],
            'seventeenth' => ['val' => '17', 'suffix' => 'th', 'power' => 10],
            'seventeen'   => ['val' => '17', 'power' => 10],
            'eighteenth'  => ['val' => '18', 'suffix' => 'th', 'power' => 10],
            'eighteen'    => ['val' => '18', 'power' => 10],
            'nineteenth'  => ['val' => '19', 'suffix' => 'th', 'power' => 10],
            'nineteen'    => ['val' => '19', 'power' => 10],
            'twentieth'   => ['val' => '20', 'suffix' => 'th', 'power' => 10],
            'twenty'      => ['val' => '20', 'power' => 10],
            'thirty'      => ['val' => '30', 'power' => 10],
            'forty'       => ['val' => '40', 'power' => 10],
            'fourty'      => ['val' => '40', 'power' => 10], // common misspelling
            'fifty'       => ['val' => '50', 'power' => 10],
            'sixty'       => ['val' => '60', 'power' => 10],
            'seventy'     => ['val' => '70', 'power' => 10],
            'eighty'      => ['val' => '80', 'power' => 10],
            'ninety'      => ['val' => '90', 'power' => 10],
            'hundred'     => ['val' => '100', 'power' => 100],
            'thousand'    => ['val' => '1000', 'power' => 1000],
            'million'     => ['val' => '1000000', 'power' => 1000000],
            'billion'     => ['val' => '1000000000', 'power' => 1000000000],
            'and'         => ['val' => '', 'power' => null],
            '-'           => ['val' => '', 'power' => null],
        ];
        $powers = array_column($tokens, 'power', 'val');

        $mutate = function ($parts) use (&$mutate, $powers){
            $stack = new \SplStack;
            $sum   = 0;
            $last  = null;

            foreach ($parts as $idx => $arr) {
                $part = $arr['val'];

                if (!$stack->isEmpty()) {
                    $check = $last ?? $part;

                    if ((float)$stack->top() < 20 && (float)$part < 20 ?? (float)$part < $stack->top() ) { //пропускаем спец числительные
                        return $stack->top().(isset($parts[$idx - $stack->count()]['suffix']) ? $parts[$idx - $stack->count()]['suffix'] : '')." ".$mutate(array_slice($parts, $idx));
                    }
                    if (isset($powers[$check]) && $powers[$check] <= $arr['power'] && $arr['power'] <= 10) { //но добавляем степени (сотни, тысячи, миллионы итп)
                        return $stack->top().(isset($parts[$idx - $stack->count()]['suffix']) ? $parts[$idx - $stack->count()]['suffix'] : '')." ".$mutate(array_slice($parts, $idx));
                    }
                    if ($stack->top() > $part) {
                        if ($last >= 1000) {
                            $sum += $stack->pop();
                            $stack->push($part);
                        } else {
                            // twenty one -> "20 1" -> "20 + 1"
                            $stack->push($stack->pop() + (float) $part);
                        }
                    } else {
                        $stack->push($stack->pop() * (float) $part);
                    }
                } else {
                    $stack->push($part);
                }

                $last = $part;
            }

            return $sum + $stack->pop();
        };

        $prepared = preg_split('/(['.$delims.'])/', $input, -1, PREG_SPLIT_DELIM_CAPTURE);

        //Замена на токены
        foreach ($prepared as $idx => $word) {
            if (is_array($word)) {continue;}
            $maybeNumPart = trim(strtolower($word));
            if (isset($tokens[$maybeNumPart])) {
                $item = $tokens[$maybeNumPart];
                if (isset($prepared[$idx+1])) {
                    $maybeDelim = $prepared[$idx+1];
                    if ($maybeDelim === " ") {
                        $item['delim'] = $maybeDelim;
                        unset($prepared[$idx + 1]);
                    } elseif ($item['power'] == null && !isset($tokens[$maybeDelim])) {
                        continue;
                    }
                }
                $prepared[$idx] = $item;
            }
        }

        $result      = [];
        $accumulator = [];

        $getNumeral = function () use ($mutate, &$accumulator, &$result) {
            $last        = end($accumulator);
            $result[]    = $mutate($accumulator).(isset($last['suffix']) ? $last['suffix'] : '').(isset($last['delim']) ? $last['delim'] : '');
            $accumulator = [];
        };

        foreach ($prepared as $part) {
            if (is_array($part)) {
                $accumulator[] = $part;
            } else {
                if (!empty($accumulator)) {
                    $getNumeral();
                }
                $result[] = $part;
            }
        }
        if (!empty($accumulator)) {
            $getNumeral();
        }

        return implode('', array_filter($result));
    }
}

$testStrings = [
    'thirty thirty eighty one one eighty' => '30 30 81 1 80',
    'twenty twenty' => '20 20',
    'twelfth eleventh tenth' => '12th 11th 10th',
    'ten eleven twelve' => '10 11 12',
    'one two five zero' => '1 2 5 0',
    'One First Two' => '1 1st 2',
    'One First Two Second Three Third Four Fourth Five Fifth Six Sixth Seven' => '1 1st 2 2nd 3 3rd 4 4th 5 5th 6 6th 7',
    'Bus number fifteen from bus stop number Eighty three thousand one hundred thirty nine' => 'Bus number 15 from bus stop number 83139',
    'get the fifteenth cookie from fifth jar on second left shelf' => 'get the 15th cookie from 5th jar on 2nd left shelf',
    'One hundred million monkeys could not write second Macbeth' => '100000000 monkeys could not write 2nd Macbeth',
    'Taganskaya str. thirty two, three hundred fifty six' => 'Taganskaya str. 32, 356',
    'Lenina str 56/17 b. one hundred seven' => 'Lenina str 56/17 b. 107',
    'Paris & Hilton road, twenty two, house 356' => 'Paris & Hilton road, 22, house 356',
    'Wien, Wilhelmstraße zwei hundert sieben und dreißig' => 'Wien, Wilhelmstraße zwei hundert sieben und dreißig',
    'Vienna, Wilhelmstrasse two hundred and thirty seven' => 'Vienna, Wilhelmstrasse 237',
];

$converter = new Converter();
foreach ($testStrings as $input => $expected) {
    $output = $converter::wordsToNumber($input);
    echo $input."\t=>\t".$output."\n";
    if ($output != $expected) { die("words to number conversion failed!");}
}

答案 4 :(得分:1)

我发现的最简单的方法是使用 numfmt_parse

$fmt = numfmt_create('en_US', NumberFormatter::SPELLOUT);
echo numfmt_parse($fmt, 'one million two hundred thirty-four thousand five hundred sixty-seven');

(来源; Dorian在https://stackoverflow.com/a/31588055/11827985上的帖子):

答案 5 :(得分:0)

PEAR Numbers_Words包可能是一个好的开始:http://pear.php.net/package-info.php?package=Numbers_Words

答案 6 :(得分:-2)

你提到了一些错误脚本请在开发人员的角度检查一次ex:83139如果你用语言问它会给出不同的答案

传递一个波纹管,并检查所有:

公交车站号码八十三万一百三十九号公交车号码十五号“