将段落中句子的每个首字母大写

时间:2019-05-18 17:43:28

标签: python regex python-3.x

我想在一个充满句子的整个段落(str)中将第一个单词的首字母大写。问题在于所有字符都是小写。

我尝试过这样的事情:

text = "here a long. paragraph full of sentences. what in this case does not work. i am lost" 
re.sub(r'(\b\. )([a-zA-z])', r'\1' (r'\2').upper(), text) 

我希望这样:

“很长。充满句子的段落。在这种情况下不起作用。我迷路了。”

2 个答案:

答案 0 :(得分:6)

您可以将re.sublambda一起使用:

import re
text = "here a long. paragraph full of sentences. what in this case does not work. i am lost" 
result = re.sub('(?<=^)\w|(?<=\.\s)\w', lambda x:x.group().upper(), text)

输出:

'Here a long. Paragraph full of sentences. What in this case does not work. I am lost'

正则表达式说明:

(?<=^)\w:匹配在行首之前的字母数字字符。

(?<=\.\s)\w:匹配字母数字字符,后跟句点和空格。

答案 1 :(得分:0)

您可以使用((?:^|\.\s)\s*)([a-z]) regex(,它不依赖于周围环境,有时您可能正在使用的regex方言中可能没有这些外观,因此更简单且得到了广泛的支持。例如Javascript不会尽管EcmaScript2018支持该功能,但尚未得到广泛支持。),您可以捕获句子开头的零个或多个开头的空格,或者一个或多个空格后接一个文字点{ {1}}并在group1中捕获它,然后使用.捕获一个小写字母,并在group2中捕获,并使用lambda表达式将匹配的文本用大写形式替换为group1捕获的文本和group2捕获的字母。检查此Python代码,

([a-z])

输出

import re

arr = ['here a long.   paragraph full of sentences. what in this case does not work. i am lost',
       '   this para contains more than one space after period and also has unneeded space at the start of string.   here a long.   paragraph full of sentences.  what in this case does not work. i am lost']

for s in arr:
    print(re.sub(r'(^\s*|\.\s+)([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))

如果您想摆脱多余的空格并将其减少到一个空格,只需将Here a long. Paragraph full of sentences. What in this case does not work. I am lost This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost 从group1中取出,并使用此正则表达式\s*和更新的Python代码,

((?:^|\.\s))\s*([a-z])

您会发现,通常需要将多余的空格减少为仅一个空格,

import re

arr = ['here a long.   paragraph full of sentences. what in this case does not work. i am lost',
       '   this para contains more than one space after period and also has unneeded space at the start of string.   here a long.   paragraph full of sentences.  what in this case does not work. i am lost']

for s in arr:
    print(re.sub(r'((?:^|\.\s))\s*([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))

此外,如果要使用基于Here a long. Paragraph full of sentences. What in this case does not work. I am lost This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost 的正则表达式引擎来完成此操作,那么您可以在正则表达式本身中使用PCRE,而不必使用lambda函数,而只需将其替换为{ {1}}

Regex Demo for PCRE based regex