Question

我有两个文件。映射文件和输入文件。

cat map.txt

试验：替换

cat input.txt

应该替换单词test。但是testbook这个词不应该被替换   取而代之的只是因为它有＆＃34; _test＆＃34;在它。

使用以下命令在文件中查找并将其替换为映射文件中的值。

awk 'FNR==NR{ array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' FS=":" map.txt FS=" " input.txt

它的作用是，搜索map.txt中提到的文本，然后用＆＃34;之后的单词替换; ：＆＃34;在同一个输入文件中。在上面的例子＆＃34; test＆＃34;用＆＃34;替换＆＃34;。

当前结果：

替换单词应该被替换。但是替换单词不应该被替换，因为它有_replace。

预期结果：

替换单词应该被替换。但是，不应该替换单词testbook，因为它有＆＃34; _test＆＃34;在它。

所以我需要的只是在发现单词时必须更换它。如果这个词有任何其他角色，那么它应该被忽略。

感谢任何帮助。

提前致谢。

Answer 1

for循环所有单词并替换所需位置：

$ awk '
NR==FNR {                     # hash the map file
    a[$1]=$2
    next
}
{
    for(i=1;i<=NF;i++)        # loop every word and if it s hashed, replace it
        if($i in a)           # ... and if it s hashed...
            $i=a[$i]          # replace it
}1
' FS=":" map FS=" " input
The word replace should be replaced.But the word testbook should not be replaced just because it has "_test" in it.

修改：使用match从字符串中提取单词以保留标点符号：

$ cat input2
Replace would Yoda test.
$ awk '
NR==FNR {                     # hash the map file
    a[$1]=$2
    next
}
{
    for(i=1;i<=NF;i++) {
        # here should be if to weed out obvious non-word-punctuation pairs
        # if($i ~ /^[a-zA-Z+][,\.!?]/)
        match($i,/^[a-zA-Z]+/)       # match from beginning of word. ¿correct?
        w=substr($i,RSTART,RLENGTH)  # extract word
        if(w in a)                   # match in a
            sub(w,a[w],$i)
    }
}1' FS=":" map FS=" " input
Replace would Yoda replace.

Answer 2

使用GNU awk进行单词边界：

awk -F':' '
NR==FNR { map[$1] = $2; next }
{
    for (old in map) {
        new = map[old]
        gsub("\\<"old"\\>",new)
    }
    print
}
' map input

如果old包含正则表达式元字符或转义字符，或者如果new包含&，则上述操作将失败，但只要两者都使用单词组成字符就可以了。

将一个文件中的值替换为另一个无法正常工作的文件中的值

2 个答案: