Question

这里有一个包含标题和分隔符的数据集，如下所示：

a|b|c|d|e|f|g
1|2|3|4|5|6|5
2|4|2|3|5|2|1

另一个具有某些列和值的配置文件，如下所示：

b:5
d:6

我的目的是使用配置文件修改数据集。结果是这样的：

a|b|c|d|e|f|g    
1|5|3|6|5|6|5
2|5|2|6|5|2|1

在awk外部不使用“ for”，如何完成此过程？

Answer 1

使用awk的方法如下：

awk '
     NR == FNR { rep[$1] = $2; next } 
     FNR == 1 { for (i = 1; i <= NF; ++i) if ($i in rep) cols[i] = rep[$i] }
     FNR > 1 { for (i in cols) $i = cols[i] }
     1
' FS=':' replacements FS='|' OFS='|' dataset

首先，将所有key:value个替换项保存到数组rep中
- 使用标准NR == FNR定位第一个文件（总行号等于该文件的行号）
- 使用next跳过脚本的其余部分
对于数据集的第一行（第二个文件），找出包含标题的列，并将其及其替换内容保存到cols
对于数据集的其余行，将列替换为其替换值
使用条件1（始终为true）打印第二个文件的所有行，这会触发默认操作{ print }

请注意，由于两个文件具有不同的分隔符，因此将它们指定为awk脚本之后的参数。 FS定义输入字段分隔符，而OFS定义下一个文件名参数的输出字段分隔符。参数应读为：

# read the file 'replacements' with input field separator set to ':'
FS=':' replacements
# read the file 'dataset' with input and output field separator set to '|'
FS='|' OFS='|' dataset

对其进行测试

$ cat replacements 
b:5
d:6
$ cat dataset 
a|b|c|d|e|f|g
1|2|3|4|5|6|5
2|4|2|3|5|2|1
$ awk '
>      NR == FNR { rep[$1] = $2; next } 
>      FNR == 1 { for (i = 1; i <= NF; ++i) if ($i in rep) cols[i] = rep[$i] }
>      FNR > 1 { for (i in cols) $i = cols[i] }
>      1
> ' FS=':' replacements FS='|' OFS='|' dataset
a|b|c|d|e|f|g
1|5|3|6|5|6|5
2|5|2|6|5|2|1

Answer 2

按以下顺序进行可能是最明智的选择。首先，您解析配置（假设awk的GNU方言）：

gawk -F \| -v OFS=\| 'NR == FNR { # this pattern trigs inside the first file
    split($0, mapping, /:/)
    rules[mapping[1]] = mapping[2]
    next # short-circuit to skip other blocks
}

下一步，在数据文件的第一行，您需要解析列标题：

FNR == 1 {
    for(i = 1; i <= NF; ++i) if($i in rules) forcedValues[i] = rules[$i]
    print
    next
}

现在您有了一个数组forcedValues，对于从1到7的某些列号（在您的示例中），该数组包含应重置为这些列的值。现在，您将处理文件的其余部分：

{
    for(i in forcedValues) $i = forcedValues[i]
    print
}' config.txt input.txt > output.txt

（本文中的三个代码段实际上是单个shell命令的一部分，应通过换行符进行连接。）

如何使用awk修改具有特定列和值的大文件？

2 个答案:

对其进行测试