将数字转换为英文字符串

时间:2014-08-06 00:08:47

标签: python nlp

http://www.easysurf.cc/cnvert18.htmhttp://www.calculatorsoup.com/calculators/conversions/numberstowords.php等网站尝试将数字字符串转换为英文字符串,但它们会提供自然的声音输出。

例如,在http://www.easysurf.cc/cnvert18.htm上:

[in]: 100456
[out]:  one hundred  thousand four hundred fifty-six

这个网站好一点,http://www.calculator.org/calculate-online/mathematics/text-number.aspx

[in]: 100456
[out]: one hundred thousand, four hundred and fifty-six

[in]: 10123124001
[out]: ten billion, one hundred and twenty-three million, one hundred and twenty-four thousand, one 

但它在某些时候破裂了:

[in]: 10000000001
[out]: ten billion, , , one 

我已经编写了自己的版本,但它涉及很多规则,从http://pastebin.com/WwFCjYtt开始上限为10亿:

import codecs

def num2word (num):
  ones = {1:"one",2:"two",3:"three",4:"four",
          5:"five",6:"six",7:"seven",8:"eight",
          9:"nine",0:"zero",10:"ten"}
  teens = {11:"eleven",12:"twelve",13:"thirteen",
           14:"fourteen",15:"fifteen"}
  tens = {2:"twenty",3:"thirty",4:"forty",
          5:"fifty",6:"sixty",7:"seventy",
          8:"eighty",9:"ninety"}
  lens = {3:"hundred",4:"thousand",6:"hundred",7:"million",
          8:"million", 9:"million",10:"billion"#,13:"trillion",11:"googol",
          }

  if num > 999999999:
    return "Number more than 1 billion"

  # Ones
  if num < 11:
    return ones[num]
  # Teens
  if num < 20:
    word = ones[num%10] + "teen" if num > 15 else teens[num]
    return word
  # Tens
  if num > 19 and num < 100:
    word = tens[int(str(num)[0])]
    if str(num)[1] == "0":
      return word
    else:
      word = word + " " + ones[num%10]
      return word

  # First digit for thousands,hundred-thousands.
  if len(str(num)) in lens and len(str(num)) != 3:
    word = ones[int(str(num)[0])] + " " + lens[len(str(num))]
  else:
    word = ""

  # Hundred to Million  
  if num < 1000000:
    # First and Second digit for ten thousands.  
    if len(str(num)) == 5:
      word = num2word(int(str(num)[0:2])) + " thousand"
    # How many hundred-thousand(s).
    if len(str(num)) == 6:
      word = word + " " + num2word(int(str(num)[1:3])) + \
            " " + lens[len(str(num))-2]
    # How many hundred(s)?
    thousand_pt = len(str(num)) - 3
    word = word + " " + ones[int(str(num)[thousand_pt])] + \
            " " + lens[len(str(num))-thousand_pt]
    # Last 2 digits.
    last2 = num2word(int(str(num)[-2:]))
    if last2 != "zero":
      word = word + " and " + last2
    word = word.replace(" zero hundred","")
    return word.strip()

  left, right = '',''  
  # Less than 1 million.
  if num < 100000000:
    left = num2word(int(str(num)[:-6])) + " " + lens[len(str(num))]
    right = num2word(int(str(num)[-6:]))
  # From 1 million to 1 billion.
  if num > 100000000 and num < 1000000000:
    left = num2word(int(str(num)[:3])) +  " " + lens[len(str(num))]
    right = num2word(int(str(num)[-6:]))
  if int(str(num)[-6:]) < 100:
    word = left + " and " + right
  else:  
    word = left + " " + right
  word = word.replace(" zero hundred","").replace(" zero thousand"," thousand")
  return word

print num2word(int(raw_input("Give me a number:\n")))

如何制作我已写入的脚本接受> billion

还有其他方法可以获得相同的输出吗?

我的代码可以用不那么冗长的方式编写吗?

1 个答案:

答案 0 :(得分:3)

解决此问题的更一般方法是使用重复划分(即divmod)并仅对必要的特殊/边缘情况进行硬编码。

例如,divmod(1034393, 1000000) -> (1, 34393),因此您可以有效地找到数百万的数字,并留下余数用于进一步计算。

可能更具说明性的例子:divmod(1034393, 1000) -> (1034, 393)允许你从右边一次取下3位十进制数字的组。

在英语中,我们倾向于将数字组成三位数,并且适用类似的规则。这应该参数化,而不是硬编码。例如,&#34; 303&#34;可能是三亿三千三百三十三,三百三十三。除了后缀之外,逻辑应该是相同的,具体取决于你所处的位置。编辑:由于递归,看起来就像这样。

这是我所说的一种方法的部分示例,使用生成器并对整数进行操作,而不是在任何地方进行大量int(str(i)[..])

say_base = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
    'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen',
    'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen']

say_tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy',
    'eighty', 'ninety']

def hundreds_i(num):
    hundreds, rest = divmod(num, 100)
    if hundreds:
        yield say_base[hundreds]
        yield ' hundred'
    if 0 < rest < len(say_base):
        yield ' and '
        yield say_base[rest]
    elif rest != 0:
        tens, ones = divmod(rest, 10)
        yield ' and '
        yield say_tens[tens]
        if ones > 0:
            yield '-'
            yield say_base[ones]

assert "".join(hundreds_i(245)) == "two hundred and forty-five"
assert "".join(hundreds_i(999)) == 'nine hundred and ninety-nine'
assert "".join(hundreds_i(200)) == 'two hundred'