答案

Question

使用sed我试图将CamelCase出现在文本文件中。我已经提出了这个代码，但在某些情况下它失败了。

IstoÉumTeste
BlaBlaBla
TestingAcronymsABCandAnotherOneKYI

输入：

 Isto Éum Teste
 Bla Bla Bla
 Testing Acronyms A B Cand Another One K Y I

输出：

Isto É um Teste
Bla Bla Bla
Testing Acronyms ABC and Another One KYI

当“重音”或首字母缩略词阻碍时......它失败了......

预期产出：

Response for preflight has invalid HTTP status code 403

编辑：
对于我的情况，我使用葡萄牙语特殊字符：àáéãõçÀÁÉÃÕÇ

Answer 1

这是一个选项：

$ sed -E -e 's/([[:lower:]])([[:upper:]])/\1 \2/g' -e 's/([[:upper:]]{2,})([[:lower:]])/\1 \2/g' input.txt
Isto Éum Teste
Bla Bla Bla
Testing Acronyms ABC and Another One KYI

首先，要处理Unicode字符，我们应该使用[:upper:]和[:lower:]字符类（而不是仅使用ASCII [A-Z]），因为它们包括所有大写和小写Unicode字符（at至少在那些支持Unicode的sed上，例如GNU sed）。

其次，为了处理首字母缩略词（和前缀空格），我们可以将问题分成两个子问题：（1）在lower<to>UPPER边界上分裂，（2）在ACRONYM<to>word边界上分割。

Answer 2

答案

简

这是一个令人头疼的问题，我不知道这是否会像它的PCRE正则表达式一样在sed中运行。无论如何，这可能会帮助其他人使用其他语言提出类似请求，因此我将其作为潜在答案发布。

代码

See this code in use here

((?(?=[A-Z])(?:[A-Z]\p{Ll}+|[A-Z]+)|\p{Lu})|\p{Ll}+)

结果

输入

IstoÉumTeste
BlaBlaBla
TestingAcronymsABCandAnotherOneKYI

输出

Isto É um Teste 
Bla Bla Bla 
Testing Acronyms ABC and Another One KYI

说明

((?(?=[A-Z])(?:[A-Z]\p{Ll}+|[A-Z]+)|\p{Lu})|\p{Ll}+)捕获以下任一项
- (?(?=[A-Z])(?:[A-Z]\p{Ll}+|[A-Z]+)|\p{Lu}) If子句指定以下内容
  - (?=[A-Z])如果以下是alpha大写字母
  - 如果 true ：(?:[A-Z]\p{Ll}+|[A-Z]+)匹配以下任一项
    - [A-Z]\p{Ll}+匹配大写字母，后跟一个或多个小写字母（任何语言）
    - [A-Z]+匹配一个或多个大写字母
  - 如果 false ：\p{Lu}匹配大写字母（使用任何语言）
- \p{Ll}+匹配任何小写字母（使用任何语言）

修改

代码

See this code in use here

([A-Z]\p{Ll}+|[A-Z]+|\p{Lu}|\p{Ll}+)

最初显然是过度思考，可能会更好地支持这种缩减版本。

带有重音和首字母缩略词的CamelCase

2 个答案:

答案

简

代码

结果

输入

输出

说明

修改

代码