我想在一个充满句子的整个段落(str)中将第一个单词的首字母大写。问题在于所有字符都是小写。
我尝试过这样的事情:
text = "here a long. paragraph full of sentences. what in this case does not work. i am lost"
re.sub(r'(\b\. )([a-zA-z])', r'\1' (r'\2').upper(), text)
我希望这样:
“很长。充满句子的段落。在这种情况下不起作用。我迷路了。”
答案 0 :(得分:6)
您可以将re.sub
与lambda
一起使用:
import re
text = "here a long. paragraph full of sentences. what in this case does not work. i am lost"
result = re.sub('(?<=^)\w|(?<=\.\s)\w', lambda x:x.group().upper(), text)
输出:
'Here a long. Paragraph full of sentences. What in this case does not work. I am lost'
正则表达式说明:
(?<=^)\w
:匹配在行首之前的字母数字字符。
(?<=\.\s)\w
:匹配字母数字字符,后跟句点和空格。
答案 1 :(得分:0)
您可以使用((?:^|\.\s)\s*)([a-z])
regex(,它不依赖于周围环境,有时您可能正在使用的regex方言中可能没有这些外观,因此更简单且得到了广泛的支持。例如Javascript不会尽管EcmaScript2018支持该功能,但尚未得到广泛支持。),您可以捕获句子开头的零个或多个开头的空格,或者一个或多个空格后接一个文字点{ {1}}并在group1中捕获它,然后使用.
捕获一个小写字母,并在group2中捕获,并使用lambda表达式将匹配的文本用大写形式替换为group1捕获的文本和group2捕获的字母。检查此Python代码,
([a-z])
输出
import re
arr = ['here a long. paragraph full of sentences. what in this case does not work. i am lost',
' this para contains more than one space after period and also has unneeded space at the start of string. here a long. paragraph full of sentences. what in this case does not work. i am lost']
for s in arr:
print(re.sub(r'(^\s*|\.\s+)([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))
如果您想摆脱多余的空格并将其减少到一个空格,只需将Here a long. Paragraph full of sentences. What in this case does not work. I am lost
This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost
从group1中取出,并使用此正则表达式\s*
和更新的Python代码,>
((?:^|\.\s))\s*([a-z])
您会发现,通常需要将多余的空格减少为仅一个空格,
import re
arr = ['here a long. paragraph full of sentences. what in this case does not work. i am lost',
' this para contains more than one space after period and also has unneeded space at the start of string. here a long. paragraph full of sentences. what in this case does not work. i am lost']
for s in arr:
print(re.sub(r'((?:^|\.\s))\s*([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))
此外,如果要使用基于Here a long. Paragraph full of sentences. What in this case does not work. I am lost
This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost
的正则表达式引擎来完成此操作,那么您可以在正则表达式本身中使用PCRE
,而不必使用lambda函数,而只需将其替换为{ {1}}