Bash脚本/实用程序将英国英语转换为TeX文档中的美国拼写

时间:2016-05-31 23:39:00

标签: bash awk sed latex spelling

我正在寻找一个快速的Bash脚本,将英国/新西兰的拼写转换为TeX文档中的美国(用于与美国学者和期刊提交合作)。这是一本正式的数学生物学论文,几乎没有区域术语或语法:先前的工作是作为公式而不是引号给出的。

如,

Generalise - > Generalize

Colour - > Color

Centre - > Centre

图中必须有基于sedawk的脚本来替换大多数常见的拼写差异。

有关更多详细信息,请参阅相关的TeX论坛问题。

https://tex.stackexchange.com/questions/312138/converting-uk-to-us-spellings

n.b。我目前在Ubuntu 16.04或Elementary OS 0.3 Freya上使用kile编译PDFLaTeX,但如果在其他地方有内置修复,我可以使用另一个TeX编译器/包。

感谢您的协助。

2 个答案:

答案 0 :(得分:0)

我认为您需要有一个替换清单,并将其称为翻译。您必须丰富您的字典文件以有效地翻译文本文件。

sourceFile=$1
dict=$2

while read line
    do
     word=$(echo $line |awk '{print $1}')
     updatedWord=$(grep -i $word $dict|awk '{print $2}')

     sed -i "s/$word/$updatedWord/g" $sourceFile 2 > /dev/null

   done < $dict

运行上述脚本,如:

./scriptName source.txt dictionary.txt 

这是我使用的一个示例词典:

>cat dict
characterize characterise
prioritize prioritise
specialize specialise
analyze analyse
catalyze catalyse
size size
exercise exercise
behavior behaviour
color colour
favor favour
contour contour
center centre
fiber fibre
liter litre
parameter parameter
ameba amoeba
anesthesia anaesthesia
diarrhea diarrhoea
esophagus oesophagus
leukemia leukaemia
cesium caesium
defense defence
practice  practice
license  licence
defensive defensive
advice  advice
aging ageing
acknowledgment acknowledgement
judgment judgement
analog analogue
dialog dialogue
fulfill fulfil
enroll enrol
skill, skillful skill, skilful
labeled labelled
signaling signalling
propelled propelled
revealing revealing

执行结果:

cat source
color of this fiber is great and we should analyze it.

./ScriptName source.txt dict.txt

cat source
colour of this fibre is great and we should analyse it.

答案 1 :(得分:0)

我认为我的awk解决方案比sed更灵活。 这个prg。离开LaTeX命令(当单词以&#34; \&#34;开头)时,它将保留单词的第一个大写字母。 LaTeX命令(和普通文本)的参数将被字典文件替代。 当[rev]程序的第三个参数打开时,它将通过相同的字典文件进行反转替换。 任何非alpha-beta字符都用作单词分隔符(这在LaTeX源文件中是必需的)。 prg将其输出写入屏幕(stdout),因此您需要使用重定向文件(&gt; output_f)。 (我认为你的LaTeX源的输入编码是1字节/字符。)

> cat dic.sh
#!/bin/bash
(($#<2))&& { echo "Usage $0 dictionary_file latex_file [rev]"; exit 1; }
((d= $#==3 ? 0:1))
awk -v d=$d '
 BEGIN {cm=fx=0; fn="";}
 fn!=FILENAME {fx++; fn=FILENAME;}
 fx==1 {if(!NF)next; if(d)a[$1]=$2; else a[$2]=$1; next;} #read dict or rev dict file into an associative array
 fx==2 { for(i=1; i<=length($0); i++)
            {c=substr($0,i,1);                            #read characters from a given line of LaTeX source    
             if(cm){printf("%s",c); if(c~"[^A-Za-z0-9\\\]")cm=0;}  #LaTeX command is occurred
             else if(c~"[A-Za-z]")w=w c; else{pr(); printf("%s",c); if(c=="\\")cm=1;} #collect alpha-bets or handle them
            }
         pr(); printf("\n");                              #handle collected last word in the line 
       }
function pr(  s){   # print collected word or its substitution by dictionary and recreates first letter case
   if(!length(w))return;
   s=tolower(w);
   if(!(s in a))printf("%s",w);
   else printf("%s", s==w ? a[s] : toupper(substr(a[s],1,1)) substr(a[s],2));
   w="";}
' $1 $2        

字典文件:

> cat dictionary
apple      lemon
raspberry  cherry
pear       banana

输入LaTeX来源:

> cat src.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].

Raspberry12Apple,pear.

执行结果:

> ./dic.sh 
Usage ./dic.sh dictionary_file latex_file [rev]

> ./dic.sh dictionary src.txt >out1.txt; cat out1.txt
Lemon123banana,lemon "banana".
\Apple123pear{cherry}{banana}[lemon].

Cherry12Lemon,banana.

> ./dic.sh dictionary out1.txt >out2.txt rev; cat out2.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].

Raspberry12Apple,pear.

> diff src.txt out2.txt   # they are identical