用异常来标记字符串

时间:2010-09-16 16:25:36

标签: python string title-case

Python中是否有标准的方法来标记字符串(即单词以大写字母开头,所有剩余的套接字符都是小写字母)但是留下andin和{{1}等文章小写的?

9 个答案:

答案 0 :(得分:142)

这有一些问题。如果使用拆分和连接,则会忽略某些空白字符。内置的大写和标题方法不会忽略空格。

>>> 'There     is a way'.title()
'There     Is A Way'

如果一个句子以文章开头,则不希望标题的第一个单词为小写。

记住这些:

import re 
def title_except(s, exceptions):
    word_list = re.split(' ', s)       # re.split behaves as expected
    final = [word_list[0].capitalize()]
    for word in word_list[1:]:
        final.append(word if word in exceptions else word.capitalize())
    return " ".join(final)

articles = ['a', 'an', 'of', 'the', 'is']
print title_except('there is a    way', articles)
# There is a    Way
print title_except('a whim   of an elephant', articles)
# A Whim   of an Elephant

答案 1 :(得分:47)

使用titlecase.py模块!仅适用于英语。

>>> from titlecase import titlecase
>>> titlecase('i am a foobar bazbar')
'I Am a Foobar Bazbar'

GitHub:https://github.com/ppannuto/python-titlecase

答案 2 :(得分:20)

有以下方法:

>>> mytext = u'i am a foobar bazbar'
>>> print mytext.capitalize()
I am a foobar bazbar
>>> print mytext.title()
I Am A Foobar Bazbar

没有小写文章选项。你必须自己编写代码,可能是使用你想要降低的文章列表。

答案 3 :(得分:12)

Stuart Colville has made a Python port a Perl script written by John Gruber将字符串转换为标题大小写,但避免根据“纽约时报手册”中的规则对小词进行大写,并为几种特殊情况提供服务。

这些脚本的一些聪明之处:

  • 他们将 if,in,of,on 等小词汇大写,但如果它们在输入中被错误地大写,则会将它们取消大写。

    < / LI>
  • 脚本假定具有除第一个字符以外的大写字母的单词已经正确大写。这意味着他们会单独留下像“iTunes”这样的词,而不是将其分为“iTunes”或更糟糕的“Itunes”。

  • 他们跳过任何带点线的单词; “example.com”和“del.icio.us”将保持小写。

  • 他们有专门处理奇怪情况的硬编码黑客,比如“AT&amp; T”和“Q&amp; A”,两者都包含通常应该小写的小字(at和a)。 / p>

  • 标题的第一个和最后一个字总是大写的,所以诸如“没什么可害怕的”这样的输入将变成“没什么可怕的”。

  • 结肠后的一个小词将被大写。

您可以下载here

答案 4 :(得分:3)

capitalize (word)

这应该做。我得到的不同。

>>> mytext = u'i am a foobar bazbar'
>>> mytext.capitalize()
u'I am a foobar bazbar'
>>>

好的,如上面的回复中所说,你必须自定义大写:

mytext =你是一个foobar bazbar'

def xcaptilize(word):
    skipList = ['a', 'an', 'the', 'am']
    if word not in skipList:
        return word.capitalize()
    return word

k = mytext.split(" ") 
l = map(xcaptilize, k)
print " ".join(l)   

此输出

I am a Foobar Bazbar

答案 5 :(得分:2)

Python 2.7的标题方法有一个缺陷。

value.title()
当值为Carpenter&#39; s 助理

时,

将返回Carpenter&#39; S 助理

最好的解决方案可能来自@BioGeek使用Stuart Colville的标题。这与@Etienne提出的解决方案相同。

答案 6 :(得分:1)

 not_these = ['a','the', 'of']
thestring = 'the secret of a disappointed programmer'
print ' '.join(word
               if word in not_these
               else word.title()
               for word in thestring.capitalize().split(' '))
"""Output:
The Secret of a Disappointed Programmer
"""

标题以大写单词开头,与文章不符。

答案 7 :(得分:1)

使用列表理解和三元运算符的单行

reslt = " ".join([word.title() if word not in "the a on in of an" else word for word in "Wow, a python one liner for titles".split(" ")])
print(reslt)

<强>故障:

for word in "Wow, a python one liner for titles".split(" ")将字符串拆分为一个列表并启动一个for循环(在列表中理解)

word.title() if word not in "the a on in of an" else word使用原生方法title()标题字符串,如果它不是文章

" ".join使用(空格)

的分隔符连接列表元素

答案 8 :(得分:0)

一个不被考虑的重要情况是首字母缩写(如果您明确提供首字母缩写作为例外,则python-titlecase解决方案可以处理首字母缩写)。相反,我宁愿简单地避免使用套管。通过这种方法,已经是大写的首字母缩略词保留为大写。以下代码是对Dererosaur最初提供的代码的修改。

# This is an attempt to provide an alternative to ''.title() that works with 
# acronyms.
# There are several tricky cases to worry about in typical order of importance:
# 0. Upper case first letter of each word that is not an 'minor' word.
# 1. Always upper case first word.
# 2. Do not down case acronyms
# 3. Quotes
# 4. Hyphenated words: drive-in
# 5. Titles within titles: 2001 A Space Odyssey
# 6. Maintain leading spacing
# 7. Maintain given spacing: This is a test.  This is only a test.

# The following code addresses 0-3 & 7.  It was felt that addressing the others 
# would add considerable complexity.


def titlecase(
    s,
    exceptions = (
        'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
        'for', 'in', 'of', 'on', 'per', 'to'
    )
):
    words = s.strip().split(' ')
        # split on single space to maintain word spacing
        # remove leading and trailing spaces -- needed for first word casing

    def upper(s):
        if s:
            if s[0] in '‘“"‛‟' + "'":
                return s[0] + upper(s[1:])
            return s[0].upper() + s[1:]
        return ''

    # always capitalize the first word
    first = upper(words[0])

    return ' '.join([first] + [
        word if word.lower() in exceptions else upper(word)
        for word in words[1:]
    ])


cases = '''
    CDC warns about "aggressive" rats as coronavirus shuts down restaurants
    L.A. County opens churches, stores, pools, drive-in theaters
    UConn senior accused of killing two men was looking for young woman
    Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,’ study reveals
    Maintain given spacing: This is a test.  This is only a test.
'''.strip().splitlines()

for case in cases:
    print(titlecase(case))

运行时,它将产生以下内容:

CDC Warns About "Aggressive" Rats as Coronavirus Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
UConn Senior Accused of Killing Two Men Was Looking for Young Woman
Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,’ Study Reveals
Maintain Given Spacing: This Is a Test.  This Is Only a Test.