Question

我想在一个充满句子的整个段落（str）中将第一个单词的首字母大写。问题在于所有字符都是小写。

我尝试过这样的事情：

text = "here a long. paragraph full of sentences. what in this case does not work. i am lost" 
re.sub(r'(\b\. )([a-zA-z])', r'\1' (r'\2').upper(), text)

我希望这样：

“很长。充满句子的段落。在这种情况下不起作用。我迷路了。”

Answer 1

您可以将re.sub与lambda一起使用：

import re
text = "here a long. paragraph full of sentences. what in this case does not work. i am lost" 
result = re.sub('(?<=^)\w|(?<=\.\s)\w', lambda x:x.group().upper(), text)

输出：

'Here a long. Paragraph full of sentences. What in this case does not work. I am lost'

正则表达式说明：

(?<=^)\w：匹配在行首之前的字母数字字符。

(?<=\.\s)\w：匹配字母数字字符，后跟句点和空格。

Answer 2

您可以使用((?:^|\.\s)\s*)([a-z]) regex（，它不依赖于周围环境，有时您可能正在使用的regex方言中可能没有这些外观，因此更简单且得到了广泛的支持。例如Javascript不会尽管EcmaScript2018支持该功能，但尚未得到广泛支持。），您可以捕获句子开头的零个或多个开头的空格，或者一个或多个空格后接一个文字点{ {1}}并在group1中捕获它，然后使用.捕获一个小写字母，并在group2中捕获，并使用lambda表达式将匹配的文本用大写形式替换为group1捕获的文本和group2捕获的字母。检查此Python代码，

([a-z])

输出

import re

arr = ['here a long.   paragraph full of sentences. what in this case does not work. i am lost',
       '   this para contains more than one space after period and also has unneeded space at the start of string.   here a long.   paragraph full of sentences.  what in this case does not work. i am lost']

for s in arr:
    print(re.sub(r'(^\s*|\.\s+)([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))

如果您想摆脱多余的空格并将其减少到一个空格，只需将Here a long. Paragraph full of sentences. What in this case does not work. I am lost This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost从group1中取出，并使用此正则表达式\s*和更新的Python代码，

((?:^|\.\s))\s*([a-z])

您会发现，通常需要将多余的空格减少为仅一个空格，

import re

arr = ['here a long.   paragraph full of sentences. what in this case does not work. i am lost',
       '   this para contains more than one space after period and also has unneeded space at the start of string.   here a long.   paragraph full of sentences.  what in this case does not work. i am lost']

for s in arr:
    print(re.sub(r'((?:^|\.\s))\s*([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))

此外，如果要使用基于Here a long. Paragraph full of sentences. What in this case does not work. I am lost This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost的正则表达式引擎来完成此操作，那么您可以在正则表达式本身中使用PCRE，而不必使用lambda函数，而只需将其替换为{ {1}}

Regex Demo for PCRE based regex

将段落中句子的每个首字母大写

2 个答案: