如何处理通过管道运行的多个字段?

时间:2017-07-20 18:16:06

标签: bash pipe cut io-redirection

如果我的标签已删除数据文件 input.dat ,格式如下:

#id  acct    name   city          age
 12  100290  Sally  San Francisco 24
 15  102911  Jerry  Sacramento    40
 99  102134  Amir   Eureka        82

我可以使用cut(1)或类似的东西来运行多个处理函数ex :( lookup_id, scrub_acct, scrub_name, lookup_city, scrub_age)每个字段,因为数据通过管道运行吗?

使用一个字段很容易做到这一点:

cat input.dat | cut -f1 | lookup_id > output.dat

但我想知道是否有办法在每个字段中执行此操作,并将结果重定向到 output.dat

#id  acct    name   city          age
 AA  XXXXX0  SXXXX  city-57       20s
 AC  XXXXX1  JXXXX  city-29       40s
 AF  XXXXX4  AXXXX  city-100      80s

也许你可以提出一个问题(简单地这样做)?

我也在考虑paste(1)如何将列重新粘合在一起,但也许有更好的方法。

2 个答案:

答案 0 :(得分:2)

通常在awk中处理行,列数据更容易,但由于shell函数的参与,最好在shell本身处理它。

假设lookup_id, scrub_acct, scrub_name, lookup_city, scrub_age是从stdin读取输入的shell函数或脚本,您可以创建它们的数组并在从输入文件循环遍历每个记录时调用它们:

# example shell functions
lookup_id() { read str; printf "lookup_id: %s\n" "$str"; }
scrub_acct() { read str; printf "scrub_acct: %s\n" "$str"; }
scrub_name() { read str; printf "scrub_name: %s\n" "$str"; }
lookup_city() { read str; printf "lookup_city: %s\n" "$str"; }
scrub_age() { read str; printf "scrub_age: %s\n" "$str"; }    

# array of functions or scripts to be invoked
fnarr=(lookup_id scrub_acct scrub_name lookup_city scrub_age)

# main processing
while IFS=$'\t' read -ra ary; do
   for ((i=0; i<${#ary[@]}; i++)); do
      # call function for each field value
      "${fnarr[i]}" <<< "${ary[i]}"
   done
   echo '============================='
done < <(tail -n +2 file)

<强>输出:

lookup_id: 12
scrub_acct: 100290
scrub_name: Sally
lookup_city: San Francisco
scrub_age: 24
=============================
lookup_id: 15
scrub_acct: 102911
scrub_name: Jerry
lookup_city: Sacramento
scrub_age: 40
=============================
lookup_id: 99
scrub_acct: 102134
scrub_name: Amir
lookup_city: Eureka
scrub_age: 82
=============================

答案 1 :(得分:1)

使用awk尝试这样的事情:

awk -F'\t' '{system("lookup_id "  $1); printf("\t"); \
             system("scrub_acct " $2); printf("\t"); \
             ...
            }' input.dat