序数替换

时间:2012-03-10 14:27:50

标签: python nlp nltk ordinals

我目前正在寻找用适当的序号表示(第1,第2,第3)替换第一,第二,第三等字的方法。 我上周一直在谷歌搜索,我没有找到任何有用的标准工具或NLTK的任何功能。

那么有没有或者我应该手动编写一些正则表达式?

感谢您的任何建议

18 个答案:

答案 0 :(得分:88)

这是一个简洁的解决方案,来自Gareth on codegolf

ordinal = lambda n: "%d%s" % (n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])

适用于任何数字:

print([ordinal(n) for n in range(1,32)])

['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th',
 '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th',
 '20th', '21st', '22nd', '23rd', '24th', '25th', '26th', '27th', '28th',
 '29th', '30th', '31st']

对于python 3.4+,需要math.floor

import math
ordinal = lambda n: "%d%s" % (n,"tsnrhtdd"[(math.floor(n/10)%10!=1)*(n%10<4)*n%10::4])

答案 1 :(得分:8)

这个怎么样:

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
print [suf(n) for n in xrange(1,32)]

['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th',
 '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th',
 '20th', '21st', '22nd', '23rd', '24th', '25th', '26th', '27th', '28th',
 '29th', '30th', '31st']

答案 2 :(得分:7)

a previous question的已接听答案有一半的算法:它将"first"变为1。要从那里到"1st",请执行以下操作:

suffixes = ["th", "st", "nd", "rd", ] + ["th"] * 16
suffixed_num = str(num) + suffixes[num % 100]

这仅适用于数字0-19。

答案 3 :(得分:6)

我想在我的项目中使用序数,在一些原型之后,我认为这种方法虽然不小但适用于任何正整数,是任何整数

通过确定数字是高于还是低于20来起作用,如果数字低于20,则将int 1转换为字符串1,2,2; 3,3;其余的将添加“st”。

对于超过20的数字,它将采用最后和倒数第二个数字,我分别称为数字和单位,并测试它们以查看要添加到数字的内容。

顺便说一下,这是在python中,所以我不确定其他语言是否能够找到字符串中的最后一个或倒数第二个数字,如果它们应该很容易翻译。

def o(numb):
    if numb < 20: #determining suffix for < 20
        if numb == 1: 
            suffix = 'st'
        elif numb == 2:
            suffix = 'nd'
        elif numb == 3:
            suffix = 'rd'
        else:
            suffix = 'th'  
    else:   #determining suffix for > 20
        tens = str(numb)
        tens = tens[-2]
        unit = str(numb)
        unit = unit[-1]
        if tens == "1":
           suffix = "th"
        else:
            if unit == "1": 
                suffix = 'st'
            elif unit == "2":
                suffix = 'nd'
            elif unit == "3":
                suffix = 'rd'
            else:
                suffix = 'th'
    return str(numb)+ suffix

为了便于使用,我调用了函数“o”,可以通过import ordinal然后ordinal.o(number)导入我称之为“ordinal”的文件名来调用。

让我知道你的想法:D

答案 4 :(得分:6)

我发现自己做了类似的事情,需要将带序数的地址(&#39; Third St&#39;)转换为地理编码器可以理解的格式(&#39; 3rd St&#39;)。虽然这不是很优雅,但一个快速而肮脏的解决方案是使用inflect.py生成翻译字典。

inflect.py具有number_to_words()功能,可将数字(例如2)转换为其单词形式(例如'two')。此外,还有一个ordinal()函数可以使用任意数字(数字或单词形式)并将其转换为它的序数形式(例如4 - &gt; fourth,{ {1}} - &gt; six)。这些都不是他们自己做的,你可以一起使用它们来生成一个字典,将任何提供的序数字(在合理的范围内)翻译成它的相应部分。数字序数。看看:

sixth

如果您愿意花一些时间,可以检查inflect.py在这两个功能中的内部工作情况,并构建自己的代码来动态执行此操作(我还没有#39) ; t试图这样做。)

答案 5 :(得分:4)

如果您不想增加对外部库的依赖(如suggested by luckydonald),又不想让代码的未来维护者困扰您并杀死您(因为您曾经使用过) golfed code在生产中),这是一个简短但可维护的变体:

def make_ordinal(n):
    '''
    Convert an integer into its ordinal representation::

        make_ordinal(0)   => '0th'
        make_ordinal(3)   => '3rd'
        make_ordinal(122) => '122nd'
        make_ordinal(213) => '213th'
    '''
    n = int(n)
    suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
    if 11 <= (n % 100) <= 13:
        suffix = 'th'
    return str(n) + suffix

答案 6 :(得分:4)

另一个解决方案是if [ $(command1) ] ; then if [ $(command2) ] ; then 库(pip | github)。 它特别提供不同的语言,因此本地化/国际化(aka。l10n / i18n)是不费吹灰之力的。

使用num2words安装后,用法很简单:

pip install num2words

加成:

from num2words import num2words
# english is default
num2words(4458, to="ordinal_num")
'4458rd'

# examples for other languages
num2words(4458, lang="en", to="ordinal_num")
'4458rd'

num2words(4458, lang="es", to="ordinal_num")
'4458º'

num2words(4458, lang="de", to="ordinal_num")
'4458.'

num2words(4458, lang="id", to="ordinal_num")
'ke-4458'

答案 7 :(得分:2)

如果使用django,你可以这样做:

df1.x.iloc[0] = 88

(或者在django模板中使用ordinal作为模板过滤器,尽管从python代码中调用它也是如此)

如果不使用django,你可以窃取非常整洁的their implementation

答案 8 :(得分:1)

导入人性化模块并使用常规功能。

import humanize
humanize.ordinal(4)

输出

>>> '4th'

答案 9 :(得分:1)

尝试

import sys

a = int(sys.argv[1])

for i in range(1,a+1):

j = i
if(j%100 == 11 or j%100 == 12 or j%100 == 13):
    print("%dth Hello"%(j))
    continue            
i %= 10
if ((j%10 == 1) and ((i%10 != 0) or (i%10 != 1))):
    print("%dst Hello"%(j))
elif ((j%10 == 2) and ((i%10 != 0) or (i%10 != 1))):
    print("%dnd Hello"%(j))
elif ((j%10 == 3) and ((i%10 != 0) or (i%10 != 1))):
    print("%drd Hello"%(j))
else:
    print("%dth Hello"%(j))

答案 10 :(得分:1)

这是使用num2words包的替代选项。

>>> from num2words import num2words
>>> num2words(42, to='ordinal_num')
    '42nd'

答案 11 :(得分:1)

这可以处理任何长度数,例如......#11到...#13和负整数。

def ith(i):return(('th'*(10<(abs(i)%100)<14))+['st','nd','rd',*['th']*7][(abs(i)-1)%10])[0:2]

我建议使用ith()作为名称,以避免覆盖内置的ord()。

# test routine
for i in range(-200,200):
    print(i,ith(i))

注意:使用Python 3.6测试; abs()函数在没有明确包含数学模块的情况下可用。

答案 12 :(得分:1)

如果您不想导入外部模块并且更喜欢单行解决方案,那么以下内容可能(略微)比接受的答案更具可读性:

def suffix(i):
    return {1:"st", 2:"nd", 3:"rd"}.get(i%10*(i%100 not in [11,12,13]), "th"))

它使用https://codereview.stackexchange.com/a/41300/90593https://stackoverflow.com/a/36977549/5069869建议的字典.get

我使用带有布尔值的乘法来处理特殊情况(11,12,13),而不必启动if-block。如果条件(i%100 not in [11,12,13])的计算结果为False,则整数为0,我们得到默认的“th”情况。

答案 13 :(得分:1)

此功能适用于每个数字 n 。如果 n 为负数,则将其转换为正数。如果 n 不是整数,则将其转换为整数。

def ordinal( n ):

    suffix = ['th', 'st', 'nd', 'rd', 'th', 'th', 'th', 'th', 'th', 'th']

    if n < 0:
        n *= -1

    n = int(n)

    if n % 100 in (11,12,13):
        s = 'th'
    else:
        s = suffix[n % 10]

    return str(n) + s

答案 14 :(得分:0)

Gareth的代码用现代的.format()

表示
ordinal = lambda n: "{}{}".format(n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])

答案 15 :(得分:0)

我向Gareth的lambda代码致敬。太优雅了。我只是半理解它是如何工作的。所以我试着解构它并想出了这个:

def ordinal(integer):

    int_to_string = str(integer)

    if int_to_string == '1' or int_to_string == '-1':
        print int_to_string+'st'
        return int_to_string+'st';
    elif int_to_string == '2' or int_to_string == '-2':
        print int_to_string+'nd'
        return int_to_string+'nd';
    elif int_to_string == '3' or int_to_string == '-3':
        print int_to_string+'rd'
        return int_to_string+'rd';

    elif int_to_string[-1] == '1' and int_to_string[-2] != '1':
        print int_to_string+'st'
        return int_to_string+'st';
    elif int_to_string[-1] == '2' and int_to_string[-2] != '1':
        print int_to_string+'nd'
        return int_to_string+'nd';
    elif int_to_string[-1] == '3' and int_to_string[-2] != '1':
        print int_to_string+'rd'
        return int_to_string+'rd';

    else:
        print int_to_string+'th'
        return int_to_string+'th';


>>> print [ordinal(n) for n in range(1,25)]
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11th
12th
13th
14th
15th
16th
17th
18th
19th
20th
21st
22nd
23rd
24th
['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th',             
'11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th', 
'20th', '21st', '22nd', '23rd', '24th']

答案 16 :(得分:0)

humanize中有一个序数函数

pip install humanize

>>> [(x, humanize.ordinal(x)) for x in (1, 2, 3, 4, 20, 21, 22, 23, 24, 100, 101,
...                                     102, 103, 113, -1, 0, 1.2, 13.6)]
[(1, '1st'), (2, '2nd'), (3, '3rd'), (4, '4th'), (20, '20th'), (21, '21st'),
 (22, '22nd'), (23, '23rd'), (24, '24th'), (100, '100th'), (101, '101st'),
 (102, '102nd'), (103, '103rd'), (113, '113th'), (-1, '-1th'), (0, '0th'),
 (1.2, '1st'), (13.6, '13th')]

答案 17 :(得分:0)

这是我刚才写的一个更复杂的解决方案,它考虑了复合的序数。因此它从first一直到nine hundred and ninety ninth。我需要它将字符串街道名称转换为数字序数:

import re
from collections import OrderedDict

ONETHS = {
    'first': '1ST', 'second': '2ND', 'third': '3RD', 'fourth': '4TH', 'fifth': '5TH', 'sixth': '6TH', 'seventh': '7TH',
    'eighth': '8TH', 'ninth': '9TH'
}

TEENTHS = {
    'tenth': '10TH', 'eleventh': '11TH', 'twelfth': '12TH', 'thirteenth': '13TH',
    'fourteenth': '14TH', 'fifteenth': '15TH', 'sixteenth': '16TH', 'seventeenth': '17TH', 'eighteenth': '18TH',
    'nineteenth': '19TH'
}

TENTHS = {
    'twentieth': '20TH', 'thirtieth': '30TH', 'fortieth': '40TH', 'fiftieth': '50TH', 'sixtieth': '60TH',
    'seventieth': '70TH', 'eightieth': '80TH', 'ninetieth': '90TH',
}

HUNDREDTH = {'hundredth': '100TH'}  # HUNDREDTH not s

ONES = {'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5', 'six': '6', 'seven': '7', 'eight': '8',
        'nine': '9'}

TENS = {'twenty': '20', 'thirty': '30', 'forty': '40', 'fifty': '50', 'sixty': '60', 'seventy': '70', 'eighty': '80',
        'ninety': '90'}

HUNDRED = {'hundred': '100'}

# Used below for ALL_ORDINALS
ALL_THS = {}
ALL_THS.update(ONETHS)
ALL_THS.update(TEENTHS)
ALL_THS.update(TENTHS)
ALL_THS.update(HUNDREDTH)

ALL_ORDINALS = OrderedDict()
ALL_ORDINALS.update(ALL_THS)
ALL_ORDINALS.update(TENS)
ALL_ORDINALS.update(HUNDRED)
ALL_ORDINALS.update(ONES)


def split_ordinal_word(word):
    ordinals = []
    if not word:
        return ordinals 

    for key, value in ALL_ORDINALS.items():
        if word.startswith(key):
            ordinals.append(key)
            ordinals += split_ordinal_word(word[len(key):])
            break
    return ordinals

def get_ordinals(s):
    ordinals, start, end = [], [], []
    s = s.strip().replace('-', ' ').replace('and', '').lower()
    s = re.sub(' +',' ', s)  # Replace multiple spaces with a single space
    s = s.split(' ')

    for word in s:
        found_ordinals = split_ordinal_word(word)
        if found_ordinals:
            ordinals += found_ordinals
        else:  # else if word, for covering blanks
            if ordinals:  # Already have some ordinals
                end.append(word)
            else:
                start.append(word)
    return start, ordinals, end


def detect_ordinal_pattern(ordinals):
    ordinal_length = len(ordinals)
    ordinal_string = '' # ' '.join(ordinals)
    if ordinal_length == 1:
        ordinal_string = ALL_ORDINALS[ordinals[0]]
    elif ordinal_length == 2:
        if ordinals[0] in ONES.keys() and ordinals[1] in HUNDREDTH.keys():
            ordinal_string = ONES[ordinals[0]] + '00TH'
        elif ordinals[0] in HUNDRED.keys() and ordinals[1] in ONETHS.keys():
            ordinal_string = HUNDRED[ordinals[0]][:-1] + ONETHS[ordinals[1]]
        elif ordinals[0] in TENS.keys() and ordinals[1] in ONETHS.keys():
            ordinal_string = TENS[ordinals[0]][0] + ONETHS[ordinals[1]]
    elif ordinal_length == 3:
        if ordinals[0] in HUNDRED.keys() and ordinals[1] in TENS.keys() and ordinals[2] in ONETHS.keys():
            ordinal_string = HUNDRED[ordinals[0]][0] + TENS[ordinals[1]][0] + ONETHS[ordinals[2]]
        elif ordinals[0] in ONES.keys() and ordinals[1] in HUNDRED.keys() and ordinals[2] in ALL_THS.keys():
            ordinal_string =  ONES[ordinals[0]] + ALL_THS[ordinals[2]]
    elif ordinal_length == 4:
        if ordinals[0] in ONES.keys() and ordinals[1] in HUNDRED.keys() and ordinals[2] in TENS.keys() and \
           ordinals[3] in ONETHS.keys():
                ordinal_string = ONES[ordinals[0]] + TENS[ordinals[2]][0] + ONETHS[ordinals[3]]

    return ordinal_string

以下是一些示例用法:

# s = '32 one   hundred and forty-third st toronto, on'
#s = '32 forty-third st toronto, on'
#s = '32 one-hundredth st toronto, on'
#s = '32 hundred and third st toronto, on'
#s = '32 hundred and thirty first st toronto, on'
# s = '32 nine hundred and twenty third st toronto, on'
#s = '32 nine hundred and ninety ninth st toronto, on'
s = '32 sixty sixth toronto, on'

st, ords, en = get_ordinals(s)
print st, detect_ordinal_pattern(ords), en