将列转换为行格式化输出:

时间:2014-06-06 12:51:54

标签: linux unix awk

想将非格式化输入转换为格式化输出,因为它具有多个去限制, 我很震惊,继续寻找你的建议。

Sample_Input.txt

  UMTSGSMPLMNCallDataRecord                                   
    callForwarding                                              
      chargeableDuration                                            0  4 44'BCD
      dateForStartOfCharge                                        09011B'H
      recordSequenceNumber                                        57526'D




UMTSGSMPLMNCallDataRecord                                   
    mSTerminating                                               
      chargeableDuration                                            0  4 44'BCD
      dateForStartOfCharge                                        09011B'H
      recordSequenceNumber                                        57573'D
      originalCalledNumber                                        149212345678'TBCD
      redirectingNumber                                           149387654321'TBCD




!!!!!!!!!!!!!!!!!!!!!!!!1164!!!!!!!!!!!!!!!!!!!!!!


UMTSGSMPLMNCallDataRecord                                   
    mSTerminating                                               
      chargeableDuration                                            0  0 52'BCD
      dateForStartOfCharge                                        09011B'H
      recordSequenceNumber                                        45761'D
      tariffClass                                                 2'D
      timeForStartOfCharge                                          9 46 58'BCD
      calledSubscriberIMSI                                        21329701412F'TBCD

在之前的问题中进行了搜索,得到了Mr.Porges答案的一些相关信息:

 #!/bin/sh
 # split lines on " " and use "," for output field separator
 awk 'BEGIN { FS = " "; i = 0; h = 0; ofs = "," }

   # empty line - increment item count and skip it
   /^\s*$/ { i++ ; next } 

   # normal line - add the item to the object and the header to the header list
   # and keep track of first seen order of headers
   {
      current[i, $1] = $2
      if (!($1 in headers)) {headers_ordered[h++] = $1}
      headers[$1]
   }

   END {
      h--

      # print headers
      for (k = 0; k <= h; k++)
      {
         printf "%s", headers_ordered[k]
         if (k != h) {printf "%s", ofs}
      } 
      print "" 

      # print the items for each object
      for (j = 0; j <= i; j++)
      {
         for (k = 0; k <= h; k++)
         {
            printf "%s", current[j, headers_ordered[k]]
            if (k != h) {printf "%s", ofs}
         }
         print ""
      }
  }' Sample_Input.txt

低于产量:

UMTSGSMPLMNCallDataRecord,callForwarding,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,mSTerminating,originalCalledNumber,redirectingNumber,!!!!!!!!!!!!!!!!!!!!!!!!1164!!!!!!!!!!!!!!!!!!!!!!,tariffClass,timeForStartOfCharge,calledSubscriberIMSI
  ,,,,,,,,,,,
  ,,0,09011B'H,57526'D,,,,,,,
  ,,,,,,,,,,,
  ,,,,,,,,,,,
  ,,,,,,,,,,,
  ,,0,09011B'H,57573'D,,149212345678'TBCD,149387654321'TBCD,,,,
  ,,,,,,,,,,,
  ,,,,,,,,,,,
  ,,,,,,,,,,,
  ,,,,,,,,,,,
  ,,,,,,,,,,,
  ,,0,09011B'H,45761'D,,,,,2'D,9,21329701412F'TBCD
  ,,,,,,,,,,,
它停在哪里, (一个)。当块开始时需要解决,例如&#34; UMTSGSMPLMNCallDataRecord&#34;和空字段,然后下一行字像callForwarding / mSTerminating等和空字段, 第一个字需要被视为Row(&#34; UMTSGSMPLMNCallDataRecord&#34;),下一个字线需要被视为Column(callForwarding / mSTerminating)

(b)中。需要避免ALPHAPET进入列字段,即09011B&#39; H进入09011,149212345678&#39; TBCD进入149212345678

预期产出:

UMTSGSMPLMNCallDataRecord,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,originalCalledNumber,redirectingNumber,tariffClass,timeForStartOfCharge,calledSubscriberIMSI
callForwarding,0  4 44,09011,57526,,,,,
mSTerminating,0  4 44,09011,57573,149212345678,149387654321,,,
mSTerminating,0  0 52, 09011,45761,,,2,9 46 58,21329701412

编辑:我试过以下输入:

  UMTSGSMPLMNCallDataRecord                                   
    callForwarding                                              
      chargeableDuration                                            0  4 44'BCD
      dateForStartOfCharge                                        09011B'H
      recordSequenceNumber                                        57526'D




UMTSGSMPLMNCallDataRecord                                   
    mSTerminating                                               
      chargeableDuration                                            0  4 44'BCD
      dateForStartOfCharge                                        09011B'H
      recordSequenceNumber                                        57573'D
      originalCalledNumber                                        149212345678'TBCD
      redirectingNumber                                           149387654321'TBCD




!!!!!!!!!!!!!!!!!!!!!!!!1164!!!!!!!!!!!!!!!!!!!!!!


UMTSGSMPLMNCallDataRecord                                   
    mSTerminating                                               
      chargeableDuration                                            0  0 52'BCD
      dateForStartOfCharge                                        09011B'H
      recordSequenceNumber                                        45761'D
      tariffClass                                                 2'D
      timeForStartOfCharge                                          9 46 58'BCD
      calledSubscriberIMSI                                        21329701412F'TBCD

1 个答案:

答案 0 :(得分:0)

讨论

这是一个复杂的问题,因为记录是非均匀的:有些字段缺少。由于每条记录占用多行,我们可以使用AWK的多记录功能来处理它:通过设置RS(记录分隔符)和FS(字段分隔符)变量。

接下来,我们需要处理收集标题字段。我没有一个好方法,所以我硬编码标题行。

一旦我们在标题中建立了顺序,我们需要一种从记录中提取特定字段的方法,我们通过函数get_column()来实现。此功能还可以根据您的要求在最后删除非数字数据。

最后一件事,我们需要使用自制的$2函数修剪第一列(trim())以外的空白区域。

命令行

我将代码放在make_csv.awk中。要运行它:

awk -f make_csv.awk Sample_Input.txt

档案make_csv.awk

BEGIN { 
    # Next two lines: each record is of multiple lines, each line is a
    # separate field
    RS = ""
    FS = "\n"
    print "UMTSGSMPLMNCallDataRecord,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,originalCalledNumber,redirectingNumber,tariffClass,timeForStartOfCharge,calledSubscriberIMSI"
}

function get_column(name, i, f, len) {
    # i, f and len are "local" variables
    for (i = 1; i <= NF; i++) {
        len = split($i, f, " ")
        if (f[1] == name) {
            result = f[2]
            for (i = 3; i <= len; i++) {
                result = result " " f[i]
            }

            # Remove the trailing non numeric data
            sub(/[a-zA-Z']+/, "", result) 
            return result
        }
    }
    return "" # get_column not found, return empty string
}

# Remove leading and trailing spaces
function trim(s) {
    sub(/[ \t]+$/, "", s)
    sub(/^[ \t]+/, "", s)
    return s
}

/UMTSGSMPLMNCallDataRecord/ {
    print trim($2) \
        "," get_column("chargeableDuration") \
        "," get_column("dateForStartOfCharge") \
        "," get_column("recordSequenceNumber") \
        "," get_column("originalCalledNumber") \
        "," get_column("redirectingNumber") \
        "," get_column("tariffClass") \
        "," get_column("timeForStartOfCharge") \
        "," get_column("calledSubscriberIMSI") \
        ""
}

更新

我尝试了针对AVN最新输入的AWK脚本并得到了以下输出:

UMTSGSMPLMNCallDataRecord,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,originalCalledNumber,redirectingNumber,tariffClass,timeForStartOfCharge,calledSubscriberIMSI
callForwarding,0 4 44,09011,57526,,,,,
mSTerminating,0 4 44,09011,57573,149212345678,149387654321,,,
mSTerminating,0 0 52,09011,45761,,,2,9 46 58,21329701412