如何使用sed或awk正则表达式在linux shell中解析这些数据

时间:2012-11-12 08:17:26

标签: linux shell sed awk

我在我的档案中有这些数据

 65 ---
 66 FieldType: Text
 67 FieldName: STATE
 68 FieldNameAlt: STATE
 69 FieldFlags: 4194304
 70 FieldJustification: Left
 71 FieldMaxLength: 2
 72 ---
 73 FieldType: Text
 74 FieldName: ZIP
 75 FieldNameAlt: ZIP
 76 FieldFlags: 0
 77 FieldJustification: Left
 78 ---
 79 FieldType: Signature
 80 FieldName: EMPLOYEE SIGNATURE
 81 FieldNameAlt: EMPLOYEE SIGNATURE
 82 FieldFlags: 0
 83 FieldJustification: Left
 84 ---
 85 FieldType: Text
 86 FieldName: Name_Last
 87 FieldNameAlt: LAST
 88 FieldFlags: 0
 89 FieldValue: Billa
 90 FieldJustification: Left
 91 ---

如何将数组和数据存储为数组中的键值对,如

array['fieldtype']
array['fieldName']

所有对象。

我认为分离器只是“---”但我不知道我该怎么做

3 个答案:

答案 0 :(得分:1)

这是GNU awk的一种方式。它将输入拆分为记录,然后可以对其进行处理。

parse.awk

BEGIN {
  RS = " +[0-9]+ +---\n"
  FS = "\n"
}

{
  for(i=1; i<=NF; i++) {             # for each line
    sf = split($i, a, ":")
    if(sf > 1) {                     # only accept successfully split lines
      sub("^ +[0-9]+ +", "", a[1])   # trim key
      sub("^ +", "",  a[2])          # trim value
      array[a[1]] = a[2]             # save into array hash
    }
  }
}

{
  print "Record: " NR
  for(k in array) {
    print k " -> " array[k]
  }
  print ""
}

将上述内容保存到 parse.awk 并按以下方式运行:

awk -f parse.awk infile

infile包含您要解析的数据。输出:

Record: 1

Record: 2
FieldFlags -> 4194304
FieldNameAlt -> STATE
FieldJustification -> Left
FieldType -> Text
FieldMaxLength -> 2
FieldName -> STATE

Record: 3
FieldFlags -> 0
FieldNameAlt -> ZIP
FieldJustification -> Left
FieldType -> Text
FieldMaxLength -> 2
FieldName -> ZIP

Record: 4
FieldFlags -> 0
FieldNameAlt -> EMPLOYEE SIGNATURE
FieldJustification -> Left
FieldType -> Signature
FieldMaxLength -> 2
FieldName -> EMPLOYEE SIGNATURE

Record: 5
FieldFlags -> 0
FieldNameAlt -> LAST
FieldJustification -> Left
FieldType -> Text
FieldMaxLength -> 2
FieldValue -> Billa
FieldName -> Name_Last

答案 1 :(得分:0)

您可以使用以下内容:

sed -n '/FieldType/,/FieldName/{N};s/FieldType: \([^\n]*\)\nFieldName: \([^\n]*\)/a["\2"]=\1/gp' input >> tmp.sh

并且做:

source tmp.sh

或使用eval而不是重定向和source,但员工签名字段中的空格会导致问题。

使用Perl会更有意义。

答案 2 :(得分:0)

在任何类型的awk中:

#!awk -F':[[:blank:]]*' -f
BEGIN {
    counter = 0
}
/:/ {
    array[counter,$1] = $2
}
/---/ {
    counter++;
}
END {
  # Deal with the array.
}

这会创建一个数组,其中每个单元格由“计数器”计数。包含如上所述的字段,其中array [x,key] = value。