Question

说我有一个names.dict文件：

aaren   aa r ah n
abby    ae b iy
....

我希望有一个脚本将语音发音或空格后的任何内容转换为大写。请注意，我是shell的新手，所以我在下面的内容主要是伪代码。

到目前为止，我有这个：

#!/bin/sh

filename=/path/to/names.dict
str temp;
str toUpper;

while read -r line
do
    echo $line > temp  // store the line into a temp string
    regexp="$temp:[[:space:]]*'"  // checks for white space
    //save whatever is after the first white spaces into 'toUpper'
    echo $toUpper | tr [a-z] [A-Z] //this converts the phonetic pronunciation to upper-case

done < "$filename"  //write the Upper-Case string to the original file, replacing the lower-case.

但我不确定如何设置正则表达式匹配语句。

编辑：链接到文件names.dict

Answer 1

更多选择：
使用sed或perl直接 - 不需要循环：

sed -E 's/(.[^[:blank:]]+)([[:blank:]])(.*)/\1\2\U\3/g' file

使用refexp class [：blank：]我们可以捕获空格或制表符。

将-i切换为sed，您可以直接在file1上应用更改以上解决方案也适用于perl。只需将sed -E -i替换为perl -pe -i，并保留相同的替换命令。 Perl的优点是可以在所有平台上以相同的方式工作。

使用bash：

while read -r f1 f2;do echo "$f1 ${f2^^}";done<file >newfile

在这种情况下，read命令会将第一个字段分配给输入行的变量$f1，将所有其余字段分配给变量$f2。使用默认的IFS（空格，制表符，换行符）可确保正确处理f1和f2之间的空白区域。

测试：

$ sed -E 's/(.[^[:blank:]]*)([[:blank:]])(.*)/\1\2\U\3/g' <<<$'one\t\t\ttwo\t  three'
one         TWO   THREE

使用您的真实文件进行测试，但使用\ L将数据转换为小写：

$ curl -sL -o- http://www.speech.cs.cmu.edu/tools/product/1491356679_01827/4320.dict |head |sed -E 's/(.[^[:blank:]]*)([[:blank:]])(.*)/\1\2\L\3/g'
 AAREN  aa r ah n
AARIKA  aa r ah k ah
ABAGAEL ae b ah g iy l
ABAGAIL ah b ae g ey l
ABBE    ae b iy
ABBE(2) ae b ey
ABBEY   ae b iy
ABBI    ae b iy
ABBIE   ae b iy
ABBY    ae b iy

Answer 2

如果您的names.dict要有更大的尺寸，那么这不是解决问题的好方法。壳很慢而且很贵。您应该非常谨慎地使用shell语言，并且在您调用的程序中完成大部分工作。

例如，你可以这样做：

paste <(cut -d\  -f1  names.dict)  <(cut -d\  -f2-  names.dict |tr a-z A-Z )

或者可以使用awk：

awk '{ 
     printf "%s ", $1; for(i=2;i<=NF;i++) printf "%s ", toupper($i); printf "\n"; 
}' names.dict

在shell脚本中进行多次迭代，特别是在每次迭代中执行昂贵的操作（例如调用程序（echo $toUpper | tr a-z A-Z）或重定向（echo $line > temp）绝对是您要避免的事情。你想写高性能的脚本。

编辑 - 示例文件：

您的问题是您的示例文件混合标签和空格：

 # Assuming you're in an empty working directory
 mkdir workdir && cd $_
 #and you've downloaded the sample
 wget -O sample http://www.speech.cs.cmu.edu/tools/product/1491356679_01827/4320.dict
 # you can downcase it and translate tabs to spaces
 tr 'A-Z\t' 'a-z ' <sample > names.dict

然后上述两个脚本都应该有效。

Shell脚本转换为大写

2 个答案: