Question

我试图编写一个简单的脚本，在大文本文件中进行多次替换。我有一张＆＃34;地图＆＃34;包含要搜索和替换的记录的文件，每行一个，由空格分隔，以及＆＃34;输入＆＃34;文件，我需要进行更改。我写的示例文件和脚本都在下面。

地图文件

 new_0 old_0
 new_1 old_1
 new_2 old_2
 new_3 old_3
 new_4 old_4

输入文件

itsa(old_0)single(old_2)string(old_1)with(old_5)ocurrences(old_4)ofthe(old_3)records

脚本

#!/bin/bash

while read -r mapline ; do

mapf1=`awk 'BEGIN {FS=" "} {print $1}' <<< "$mapline"`
mapf2=`awk 'BEGIN {FS=" "} {print $2}' <<< "$mapline"`

    for line in $(cat "input") ; do

       if [[ "${line}" == *"${mapf2}"* ]] ; then

       sed "s/${mapf2}/${mapf1}/g" <<< "${line}"    
    fi

    done < "input"

done < "map"

问题是搜索和替换是正确的，但我找不到一种方法来保存每次迭代的输出并在下一次迭代它。所以，我的输出看起来像这样：

itsa(new_0)single(old_2)string(old_1)withocurrences(old_4)ofthe(old_3)records
itsa(old_0)single(old_2)string(new_1)withocurrences(old_4)ofthe(old_3)records
itsa(old_0)single(new_2)string(old_1)withocurrences(old_4)ofthe(old_3)records
itsa(old_0)single(old_2)string(old_1)withocurrences(old_4)ofthe(new_3)records
itsa(old_0)single(old_2)string(old_1)withocurrences(new_4)ofthe(old_3)records

然而，期望的输出看起来像这样：

itsa(new_0)single(new_2)string(new_1)withocurrences(new_4)ofthe(new_3)records

愿任何人在这片黑暗的水域中带来一些光明吗？提前谢谢！

Answer 1

可以在GNU Awk中进行如下操作，

awk 'FNR==NR{hash[$2]=$1; next} \
    {for (i=1; i<=NF; i++)\
    {for(key in hash) \
    {if (match ($i,key)) {$i=sprintf("(%s)",hash[key];break;)}}}print}' \
    map-file FS='[()]' OFS= input-file

生成输出，

itsa(new_0)single(new_2)string(new_1)withold_5ocurrences(new_4)ofthe(new_3)records

Answer 2

使用split和三元运算符的Gnu awk中的另一个：

$ awk '
NR==FNR { a[$2]=$1; next }  
{
    n=split($0,b,"[()]")    
    for(i=1;i<=n;i++)       
        printf "%s%s",(i%2 ? b[i] : (b[i] in a? "(" a[b[i]] ")":"")),(i==n?ORS:"")
}' map foo
itsa(new_0)single(new_2)string(new_1)withocurrences(new_4)ofthe(new_3)records

首先，您阅读map到a哈希。在file和split处理(，)所有记录时。其他所有人都可能在map（i%2==0）。如果从printf找到匹配项并且匹配时a使用三元运算符进行测试，则输出括号。

Answer 3

改进现有脚本

改进：

使用"$()"代替``。它支持空格，更易于阅读。
不要为每一行执行sed。 sed已遍历所有行，并且比bash中的循环更快。

改编的剧本：

text="$(< input)"
while read -r mapline; do
        mapf1="$(awk 'BEGIN {FS=" "} {print $1}' <<< "$mapline")"
        mapf2="$(awk 'BEGIN {FS=" "} {print $2}' <<< "$mapline")"
        text="$(sed "s/${mapf2}/${mapf1}/g" <<< "$text")"
done < "map"
echo "$text"

变量$text包含完整的输入文件，并在每次迭代中进行修改。所有替换完成后，此脚本的输出是文件。

替代方法

将地图文件转换为sed的模式，并使用该模式执行sed一次。

pattern="$(sed 's#\(.*\) \(.*\)#s/\2/\1/g#' map)"
sed "$pattern" input

第一个命令是转换步骤。文件

new_0 old_0
new_1 old_1
...

将导致模式

s/old_0/new_0/g
s/old_1/new_1/g
...

在bash中迭代替换子串

3 个答案:

改进现有脚本

替代方法