Question

我有一个像

这样的文本文件

Apples
Big 7
Small 6

Apples
Good 5
Bad 3

Oranges
Big 4
Small 2
Good 1
Bad 5

如何访问此文件的特定部分然后执行grep？例如，如果我需要查找有多少Good Oranges，我如何使用say awk从命令行将此文件作为输入？

Answer 1

您可以像这样使用范围运算符：

awk '/Apples/,/^$/ { if (/Good/) print $2}' file

会打印出多少好苹果：

当满足第一个条件时，范围运算符,将评估为真，并且在第二个条件之前保持为真。第二个模式/^$/匹配一个空行。这意味着只会测试相关部分的Good，Bad等属性

我假设您的原始输入文件没有双倍间距？如果是，则可以修补上述方法以跳过其他所有行：

awk '!NR%2{next} /Oranges/,/^$/ { if (/Good/) print $2}' file

当记录编号NR为偶数时，NR％2为0且!0为真，因此将跳过其他每一行。

Answer 2

当您拥有名称/值对时，通常最好首先构建一个由名称索引并包含值的数组，然后您可以使用适当的名称打印您感兴趣的任何内容（ s）索引数组：

$ awk 'NF==1{key=$1} {val[key,$1]=$2} END{print val["Oranges","Good"]}' file
1

$ awk 'NF==1{key=$1} {val[key,$1]=$2} END{print val["Apples","Bad"]}' file
3

或者如果您正在寻找起点，以一种方式实现更完整/更复杂的要求：

$ awk '
NF {
    if (NF==1) {
        key=$1
        keys[key]
    }
    else {
        val[key,$1]=$2
        names[$1]
    }
}
END {
    for (key in keys)
        for (name in names)
            print key, name, val[key,name]
}
' file
Apples Big 7
Apples Bad 3
Apples Good 5
Apples Small 6
Oranges Big 4
Oranges Bad 5
Oranges Good 1
Oranges Small 2

为了测试@ JohnB的理论，即如果有数千个文件，shell脚本会比awk脚本更快，我将OPs输入文件复制5,000次到tmp目录，然后在它们上面运行这2个等效的脚本（bash one基于Johns在这个帖子中回答，然后是一个与bash一样的awk）：

$ cat tst.sh
for file in "$@"; do
    while read -r field1 field2 ; do
        [ -z "$field2" ] && name="$field1"
        case $name in
            Oranges) [ "$field1" = "Good" ] && echo "$field2";;
        esac
    done < "$file"
done

$ cat tst.awk
NF==1 { fruit=$1 }
fruit=="Oranges" && $1=="Good" { print $2 }

以及在这5,000个文件上运行的结果：

$ time ./tst.sh tmp/* > bash.out
real    0m6.490s
user    0m2.792s
sys     0m3.650s

$ time awk -f tst.awk tmp/* > awk.out
real    0m2.262s
user    0m0.311s
sys     0m1.934s

2个输出文件完全相同。

Answer 3

您可以使用Bash在循环中逐行读取文件。

while read -a fruit; do
    [ ${#fruit[@]} -eq 1 ] && name=${fruit[0]}
    case $name in
        Oranges) [ "${fruit[0]}" = "Good" ] && echo ${fruit[1]};;
    esac
done < file

你也可以将它作为一个函数并传递参数来获取任何水果的特征信息。

read_fruit (){
    while read -a fruit; do
        [ ${#fruit[@]} -eq 1 ] && name=${fruit[0]}
        case $name in
            $1) [ "${fruit[0]}" = "$2" ] && echo ${fruit[1]};;
        esac
    done < file
}

使用：

read_fruit Apples Small

结果：

如何到达文本文件的特定部分然后搜索

3 个答案: