如何为这样的情况改变案例?

时间:2013-02-07 06:48:41

标签: regex sed

我想在文本文件中使用sed或类似的内容,并将大写短语的所有实例更改为使用\ textsc { * *}包装的小写。

例如:

THIS SENTENCE IS ALL CAPS except not really

应该成为

\textsc{this sentence is all caps} except not really

如果

This Sentence Has Many Caps

应该保持

This Sentence Has Many Caps  

使用此模式s/\(.[A-Z]*\)/textsc{\L\1}/,字符串只会更改第一个单词。

有人能指出一个正确的方法吗?

更新: 正则表达式模式也应涵盖撇号

I'll BUY YOU A DRINK

大部分解决方案分解了 ' 这样的\textsc{i}'ll \textsc{buy you a} \textsc{drink}

3 个答案:

答案 0 :(得分:3)

$ cat file
THIS SENTENCE IS ALL CAPS except not really
This Sentence Has Many Caps
THIS SENTENCE Has Many Caps

$ awk -f tst.awk file
\textsc{this sentence is all caps} except not really
This Sentence Has Many Caps
\textsc{this sentence} Has Many Caps

$ cat tst.awk
{
   while ( match( $0, /([[:upper:]]{2,}[[:space:]]*)+/) ) {
      rstart  = RSTART
      rlength = RLENGTH

      if ( match( substr($0,RSTART,RLENGTH), /[[:space:]]+$/) ) {
         rlength = rlength - RLENGTH
      }

      $0 = substr($0,1,rstart-1) \
           "\\textsc{" tolower(substr($0,rstart,rlength)) "}" \
           substr($0,rstart+rlength)
   }

   print
}

答案 1 :(得分:2)

这看起来应该适合你。

echo "THIS sentence IS ALL CAPS Except not really BUT THIS IS" | \
  sed -re "s/\b(([A-Z]+ [A-Z]+)+)\b/\\\textsc{\L\1}/g"

这导致这句话:

THIS sentence \textsc{is all caps} Except not really \textsc{but this is}

/g用于全局替换(不仅仅是第一次匹配)。 \b表示短语必须以单词边界开头和结尾(不在单词的中间)。 textsc之前的三个斜杠是逃逸(逃逸)以产生最终\textsc([A-Z]+ [A-Z]+)+是捕获全大写词组。我首先尝试在字符类中添加一个空格,如[A-Z ]中所示,但这会在大括号之前产生一个空格,就像在\text{this sentence }中一样。因此,我将空间构建在单词的中间以创建短语。

请注意,这会留下孤立的大写单词。我认为这是有意的,因为问题是关于“短语”。但如果你需要更换它们,试试这个:

echo "THIS sentence IS ALL CAPS Except not really BUT THIS IS" | \
  sed -re "s/\b((([A-Z]+ [A-Z]+)+)|[A-Z]+)\b/\\\textsc{\L\1}/g"

导致

\textsc{this} sentence \textsc{is all caps} Except not really \textsc{but this is}

答案 2 :(得分:1)

这可能适合你(GNU sed):

sed -r 's/\b[A-Z]+\b( *\b[A-Z]+\b)*/\\textsc{\L&}/g' file