我想在文本文件中使用sed或类似的内容,并将大写短语的所有实例更改为使用\ textsc { * *}包装的小写。
例如:
THIS SENTENCE IS ALL CAPS except not really
应该成为
\textsc{this sentence is all caps} except not really
如果
This Sentence Has Many Caps
应该保持
This Sentence Has Many Caps
使用此模式s/\(.[A-Z]*\)/textsc{\L\1}/
,字符串只会更改第一个单词。
有人能指出一个正确的方法吗?
更新: 正则表达式模式也应涵盖撇号
I'll BUY YOU A DRINK
大部分解决方案分解了 我 和 ' 这样的\textsc{i}'ll \textsc{buy you a} \textsc{drink}
答案 0 :(得分:3)
$ cat file
THIS SENTENCE IS ALL CAPS except not really
This Sentence Has Many Caps
THIS SENTENCE Has Many Caps
$ awk -f tst.awk file
\textsc{this sentence is all caps} except not really
This Sentence Has Many Caps
\textsc{this sentence} Has Many Caps
$ cat tst.awk
{
while ( match( $0, /([[:upper:]]{2,}[[:space:]]*)+/) ) {
rstart = RSTART
rlength = RLENGTH
if ( match( substr($0,RSTART,RLENGTH), /[[:space:]]+$/) ) {
rlength = rlength - RLENGTH
}
$0 = substr($0,1,rstart-1) \
"\\textsc{" tolower(substr($0,rstart,rlength)) "}" \
substr($0,rstart+rlength)
}
print
}
答案 1 :(得分:2)
这看起来应该适合你。
echo "THIS sentence IS ALL CAPS Except not really BUT THIS IS" | \
sed -re "s/\b(([A-Z]+ [A-Z]+)+)\b/\\\textsc{\L\1}/g"
这导致这句话:
THIS sentence \textsc{is all caps} Except not really \textsc{but this is}
/g
用于全局替换(不仅仅是第一次匹配)。 \b
表示短语必须以单词边界开头和结尾(不在单词的中间)。 textsc
之前的三个斜杠是逃逸(逃逸)以产生最终\textsc
。 ([A-Z]+ [A-Z]+)+
是捕获全大写词组。我首先尝试在字符类中添加一个空格,如[A-Z ]
中所示,但这会在大括号之前产生一个空格,就像在\text{this sentence }
中一样。因此,我将空间构建在单词的中间以创建短语。
请注意,这会留下孤立的大写单词。我认为这是有意的,因为问题是关于“短语”。但如果你需要更换它们,试试这个:
echo "THIS sentence IS ALL CAPS Except not really BUT THIS IS" | \
sed -re "s/\b((([A-Z]+ [A-Z]+)+)|[A-Z]+)\b/\\\textsc{\L\1}/g"
导致
\textsc{this} sentence \textsc{is all caps} Except not really \textsc{but this is}
答案 2 :(得分:1)
这可能适合你(GNU sed):
sed -r 's/\b[A-Z]+\b( *\b[A-Z]+\b)*/\\textsc{\L&}/g' file