Question

我有一个这样的输入文件：

SomeSection.Foo
OtherSection.Foo
OtherSection.Goo

...还有另一个文件描述了哪个对象属于每个部分：

[SomeSection]
Blah
Foo
[OtherSection]
Foo
Goo

所需的输出是：

SomeSection.2   // that's because Foo appears 2nd in SomeSection
OtherSection.1  // that's because Foo appears 1st in OtherSection
OtherSection.2  // that's because Goo appears 2nd in OtherSection

（部分和对象的数量和名称是可变的）

你怎么在awk做这样的事情？

提前致谢，阿德里安。

Answer 1

一种可能性：

script.awk 的内容（带注释）：

## When 'FNR == NR', the first input file is in process.                                                                                                                                                                                     
## If line begins with '[', get the section string and reset the position                                                                                                                                                                           
## of its objects.                                                                                                                                                                                                                           
FNR == NR && $0 ~ /^\[/ {                                                                                                                                                                                                                    
        object = substr( $0, 2, length($0) - 2 )                                                                                                                                                                                             
        pos = 0
        next
}

## This section process the objects of each section. It saves them in
## an array. Variable 'pos' increments with each object processed.
FNR == NR {
        arr_obj[object, $0] = ++pos
        next
}

## This section process second file. It splits line in '.' to find second
## part in the array and prints all.
FNR < NR {
        ret = split( $0, obj, /\./ )
        if ( ret != 2 ) {
                next
        }
        printf "%s.%d\n", obj[1], arr_obj[ obj[1] SUBSEP obj[2] ]
}

运行脚本（重要的是输入文件的顺序， object.txt 包含带有对象的部分和 input.txt 调用）：

awk -f script.awk object.txt input.txt

结果：

SomeSection.2
OtherSection.1
OtherSection.2

编辑撰写评论中的问题：

我不是专家，但我会尝试解释我是如何理解的：

SUBSEP是一个字符，用于在要使用不同的值作为键时分隔数组中的索引。默认情况下为\034，但您可以将其修改为RS或FS。

在指令arr_obj[object, $0] = ++pos中，逗号将所有值与SUBSEP的值连接起来，因此在这种情况下会导致：

arr_obj[SomeSection\034Blah] = 1

在脚本结束时，我使用明确的变量arr_obj[ obj[1] SUBSEP obj[2]访问索引，但其含义与上一节中的arr_obj[object, $0]相同。

您还可以访问此索引的每个部分，并使用SUBSEP变量将其拆分，如下所示：

for (key in arr_obj) {                     ## Assign 'string\034string' to 'key' variable
    split( key, key_parts, SUBSEP )        ## Split 'key' with the content of SUBSEP variable.
    ...
}

结果为：

key_parts[1] -> SomeSection
key_parts[2] -> Blah

Answer 2

这个awk行应该做的工作：

 awk  'BEGIN{FS="[\\.\\]\\[]"}
        NR==FNR{ if(NF>1){ i=1; idx=$2; }else{ s[idx"."$1]=i; i++; } next; }
        { if($0 in s) print $1"."s[$0] } ' f2 input

见下面的测试：

kent$  head input f2
==> input <==
SomeSection.Foo
OtherSection.Foo
OtherSection.Goo

==> f2 <==
[SomeSection]
Blah
Foo
[OtherSection]
Foo
Goo

kent$  awk  'BEGIN{FS="[\\.\\]\\[]"}
        NR==FNR{ if(NF>1){ i=1; idx=$2; }else{ s[idx"."$1]=i; i++; } next; }
        { if($0 in s) print $1"."s[$0] } ' f2 input
SomeSection.2
OtherSection.1
OtherSection.2

可以根据单独的规范文件awk替换字段吗？

2 个答案: