Question

我需要使用自定义词典修改大量.txt文件中的某些字符。每个.txt文件都以相同类型的标头开头，遵循以下模型：

#title：那个老黑魔法

＃artist：Louis Prima＆amp;基利史密斯

#meter：4/4

#tonic：C

我想使用上面的“tonic：”信息来指出要使用的字典。到目前为止，我可以按照以下命令行手动修改每个文件：

awk -f script.sh dict0.txt "input.txt" >> "output.txt"

其中script.sh如下：

#!/bin/sh


NR == FNR {
  rep[$1] = $2
  next
}

{
    for (key in rep) {
      gsub(key, rep[key])
    }
print
}

并且dict0.txt是与“tonic：C”相关联的字典

此过程允许我正确修改单个文件，但强制我手动选择要使用的字典，并指定每个输入文件。我希望能够修改许多（700+）文件，而无需指定要使用的字典。我创建了一个名为index.txt的文件，它指示每个特定的补品应该使用哪个字典。索引的内容如下：

tonic: B#   dict0
tonic: C    dict0
tonic: C#   dict1
tonic: Db   dict1
tonic: D    dict2
tonic: D#   dict3
tonic: Eb   dict3
tonic: E    dict4
tonic: Fb   dict4
tonic: E#   dict5
tonic: F    dict5
tonic: F#   dict6
tonic: Gb   dict6
tonic: G    dict7
tonic: G#   dict8
tonic: Ab   dict8
tonic: A    dict9
tonic: A#   dict10
tonic: Bb   dict10
tonic: B    dict11
tonic: Cb   dict11

我还应该提到所有文件都位于主文件夹的不同子文件夹中。

我是不是在做着技术色彩？这有可能以一种不太复杂的方式进行吗？

Answer 1

有几种方法可以继续。鉴于您已经拥有的内容和您提到的文件数量（除非每个字典或文件非常大），最简单的方法是将您的awk包装在shell脚本中。您可以在每个传递中找到一个字典类型的所有文件：

for dt in "B# C C# Db ..."; do
    find inputs -type f -print | egrep -l 'tonic: $dt" | while read filename; do
        outname=`echo $filename | sed 's#inputs/#outs/#'`
        awk -f script.sh "dicts/%dt" $filename > "$outname"
    done
done

或者，找出每个文件使用的字典：

find inputs -type f -print | while read filename; do
    td =`sed -q -e '4s/tonic: *//p' $filename`
    outname=`echo $filename | sed 's#inputs/#outs/#'`
    awk -f script.sh "dicts/$td" "$outname"
done

（注意：我没有测试过这些;显然，我没有输入文件）

另一种方法是扩展误导性名为script.sh（应该是script.awk）以读取所有词典，然后决定输入行/ ^ tonic：/使用哪个词典 - 但是在补品之前的任何替换很难：行。

就个人而言，我会做第二个替代选择，因为这对我来说似乎是最直观的。您应该选择最直观的并实施它。如果文件的数量或大小使这些花费了太多时间，您可以在代码中获得更多的创意和效率。但是，让计算机做一些额外的工作，所以你不需要通常是一个很好的权衡。

Answer 2

谢谢，

我无法使其发挥作用，但在其他人的帮助下，我们提出了另一种解决方案：

#!/bin/sh

IFS='
' 
for file in $(grep -l "tonic: C" *.txt); do 
     awk -f script.awk dict0.txt "$file" > "${file%.txt}".hb 
done 

for file in $(grep -l "tonic: C#" *.txt); do 
     awk -f script.awk dict1.txt "$file" > "${file%.txt}".hb 
done

等...

它可能不太漂亮，但它应该可以解决问题。

使用不同的字典在许多.txt中递归替换文本（UNIX）

2 个答案: