想将非格式化输入转换为格式化输出,因为它具有多个去限制, 我很震惊,继续寻找你的建议。
Sample_Input.txt
UMTSGSMPLMNCallDataRecord
callForwarding
chargeableDuration 0 4 44'BCD
dateForStartOfCharge 09011B'H
recordSequenceNumber 57526'D
UMTSGSMPLMNCallDataRecord
mSTerminating
chargeableDuration 0 4 44'BCD
dateForStartOfCharge 09011B'H
recordSequenceNumber 57573'D
originalCalledNumber 149212345678'TBCD
redirectingNumber 149387654321'TBCD
!!!!!!!!!!!!!!!!!!!!!!!!1164!!!!!!!!!!!!!!!!!!!!!!
UMTSGSMPLMNCallDataRecord
mSTerminating
chargeableDuration 0 0 52'BCD
dateForStartOfCharge 09011B'H
recordSequenceNumber 45761'D
tariffClass 2'D
timeForStartOfCharge 9 46 58'BCD
calledSubscriberIMSI 21329701412F'TBCD
在之前的问题中进行了搜索,得到了Mr.Porges答案的一些相关信息:
#!/bin/sh
# split lines on " " and use "," for output field separator
awk 'BEGIN { FS = " "; i = 0; h = 0; ofs = "," }
# empty line - increment item count and skip it
/^\s*$/ { i++ ; next }
# normal line - add the item to the object and the header to the header list
# and keep track of first seen order of headers
{
current[i, $1] = $2
if (!($1 in headers)) {headers_ordered[h++] = $1}
headers[$1]
}
END {
h--
# print headers
for (k = 0; k <= h; k++)
{
printf "%s", headers_ordered[k]
if (k != h) {printf "%s", ofs}
}
print ""
# print the items for each object
for (j = 0; j <= i; j++)
{
for (k = 0; k <= h; k++)
{
printf "%s", current[j, headers_ordered[k]]
if (k != h) {printf "%s", ofs}
}
print ""
}
}' Sample_Input.txt
低于产量:
UMTSGSMPLMNCallDataRecord,callForwarding,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,mSTerminating,originalCalledNumber,redirectingNumber,!!!!!!!!!!!!!!!!!!!!!!!!1164!!!!!!!!!!!!!!!!!!!!!!,tariffClass,timeForStartOfCharge,calledSubscriberIMSI
,,,,,,,,,,,
,,0,09011B'H,57526'D,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,0,09011B'H,57573'D,,149212345678'TBCD,149387654321'TBCD,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,,
,,0,09011B'H,45761'D,,,,,2'D,9,21329701412F'TBCD
,,,,,,,,,,,
它停在哪里,
(一个)。当块开始时需要解决,例如&#34; UMTSGSMPLMNCallDataRecord&#34;和空字段,然后下一行字像callForwarding / mSTerminating等和空字段,
第一个字需要被视为Row(&#34; UMTSGSMPLMNCallDataRecord&#34;),下一个字线需要被视为Column(callForwarding / mSTerminating)
(b)中。需要避免ALPHAPET进入列字段,即09011B&#39; H进入09011,149212345678&#39; TBCD进入149212345678
预期产出:
UMTSGSMPLMNCallDataRecord,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,originalCalledNumber,redirectingNumber,tariffClass,timeForStartOfCharge,calledSubscriberIMSI
callForwarding,0 4 44,09011,57526,,,,,
mSTerminating,0 4 44,09011,57573,149212345678,149387654321,,,
mSTerminating,0 0 52, 09011,45761,,,2,9 46 58,21329701412
编辑:我试过以下输入:
UMTSGSMPLMNCallDataRecord
callForwarding
chargeableDuration 0 4 44'BCD
dateForStartOfCharge 09011B'H
recordSequenceNumber 57526'D
UMTSGSMPLMNCallDataRecord
mSTerminating
chargeableDuration 0 4 44'BCD
dateForStartOfCharge 09011B'H
recordSequenceNumber 57573'D
originalCalledNumber 149212345678'TBCD
redirectingNumber 149387654321'TBCD
!!!!!!!!!!!!!!!!!!!!!!!!1164!!!!!!!!!!!!!!!!!!!!!!
UMTSGSMPLMNCallDataRecord
mSTerminating
chargeableDuration 0 0 52'BCD
dateForStartOfCharge 09011B'H
recordSequenceNumber 45761'D
tariffClass 2'D
timeForStartOfCharge 9 46 58'BCD
calledSubscriberIMSI 21329701412F'TBCD
答案 0 :(得分:0)
这是一个复杂的问题,因为记录是非均匀的:有些字段缺少。由于每条记录占用多行,我们可以使用AWK的多记录功能来处理它:通过设置RS
(记录分隔符)和FS
(字段分隔符)变量。
接下来,我们需要处理收集标题字段。我没有一个好方法,所以我硬编码标题行。
一旦我们在标题中建立了顺序,我们需要一种从记录中提取特定字段的方法,我们通过函数get_column()
来实现。此功能还可以根据您的要求在最后删除非数字数据。
最后一件事,我们需要使用自制的$2
函数修剪第一列(trim()
)以外的空白区域。
我将代码放在make_csv.awk
中。要运行它:
awk -f make_csv.awk Sample_Input.txt
BEGIN {
# Next two lines: each record is of multiple lines, each line is a
# separate field
RS = ""
FS = "\n"
print "UMTSGSMPLMNCallDataRecord,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,originalCalledNumber,redirectingNumber,tariffClass,timeForStartOfCharge,calledSubscriberIMSI"
}
function get_column(name, i, f, len) {
# i, f and len are "local" variables
for (i = 1; i <= NF; i++) {
len = split($i, f, " ")
if (f[1] == name) {
result = f[2]
for (i = 3; i <= len; i++) {
result = result " " f[i]
}
# Remove the trailing non numeric data
sub(/[a-zA-Z']+/, "", result)
return result
}
}
return "" # get_column not found, return empty string
}
# Remove leading and trailing spaces
function trim(s) {
sub(/[ \t]+$/, "", s)
sub(/^[ \t]+/, "", s)
return s
}
/UMTSGSMPLMNCallDataRecord/ {
print trim($2) \
"," get_column("chargeableDuration") \
"," get_column("dateForStartOfCharge") \
"," get_column("recordSequenceNumber") \
"," get_column("originalCalledNumber") \
"," get_column("redirectingNumber") \
"," get_column("tariffClass") \
"," get_column("timeForStartOfCharge") \
"," get_column("calledSubscriberIMSI") \
""
}
我尝试了针对AVN最新输入的AWK脚本并得到了以下输出:
UMTSGSMPLMNCallDataRecord,chargeableDuration,dateForStartOfCharge,recordSequenceNumber,originalCalledNumber,redirectingNumber,tariffClass,timeForStartOfCharge,calledSubscriberIMSI
callForwarding,0 4 44,09011,57526,,,,,
mSTerminating,0 4 44,09011,57573,149212345678,149387654321,,,
mSTerminating,0 0 52,09011,45761,,,2,9 46 58,21329701412