将字符串与多个数组连接

时间:2020-09-22 02:23:30

标签: awk

我正在尝试从特定的字符串重新排列到相应的列中。 这是输入

String 1:  47/13528 
String 2:  55(s) 
String 3:   
String 4:  114(n) 
String 5:  225(s), 26/10533-10541 
String 6:  103/13519 
String 7:  10(s), 162(n) 
String 8:  152/12345,12346
(d=dead, n=null, s=strike) 

每个值中的字母是标志(d =死,n =空,s =敲击)。 值(数字)为“字符串1”的字符串为47c 1 等等:

String 1:  47/13528 
value without any flag will be sorted into the null column along with null tag (n)
String 1 (the integer will be concatenated with 47/13528)


Sorted : 
null
47c1@SP13528;114c4;103c6@SP13519;162c7


Str#2:  55(s)
flagged with (s) will be sorted into strike column

Sorted :
strike
55c2;225c5;26c5@SP10533-10541;162c7

我正在尝试通过修改以前的代码来解析它,似乎没有运气

{
    for (i=1; i<=NF; i++) {
        num  = $i+0
        abbr = $i
        gsub(/[^[:alpha:]]/,"",abbr)
        list[abbr] = list[abbr] num " c " val ORS
    }
}
END {
    n = split("dead null strike",types)
    for (i=1; i<=n; i++) {
        name = types[i]
        abbr = substr(name,1,1)
        printf "name,list[abbr]\n" 
    }
}

预期输出(分为csv):

dead,null,strike
,47c1@SP13528;114c4; 26c5@SP10533-10541;103c6@SP13519;162c7, 152c8@SP12345;152c8@SP12346,55c2;225c5;162c7;10c7

为进行交叉检查而细分:

dead
none 

null
47c1@SP13528;114c4;103c6@SP13519;162c7;152c8@SP12345;152c8@SP12346;26c5@SP10533-10541;;162c7

strike
55c2;225c5;10c7

2 个答案:

答案 0 :(得分:1)

我通常的approuch是:

  1. 首先对数据进行预处理,使其在一行上具有一个信息。
  2. 然后对数据进行预处理,以使每一行中的一列信息具有一个信息。
  3. 这很容易-只需在awk中将某些列中的列累加并打印即可。

以下代码:

cat <<EOF |
String 1:  47/13528 
String 2:  55(s) 
String 3:   
String 4:  114(n) 
String 5:  225(s), 26/10533-10541 
String 6:  103/13519 
String 7:  10(s), 162(n) 
String 8:  152/12345,12346
(d=dead, n=null, s=strike) 
EOF
sed '
    # filter only lines with String
    /^String \([0-9]*\): */!d;
    # Remove the String
    # Remove the : and spaces
    s//\1 /
    # remove trailing spaces
    s/ *$//
    # Remove lines with nothing
    /^[0-9]* *$/d
    # remove the commas and split lines on comma
    # by moving them to separate lines
    # repeat that until a comma is found
    : a
    /\([0-9]*\) \(.*\), *\(.*\)/{
        s//\1 \2\n\1 \3/
        ba
    }
' | sed '
    # we should be having two fields here
    # separated by a single space
    /^[^ ]* [^ ]*$/!{
        s/.*/ERROR: "&"/
        q1
    }
    # Move the name in braces to separate column
    /(\(.\))$/{
        s// \1/
        b not
    } ; {
        # default is n
        s/$/ n/
    } ; : not
    # shuffle first and second field
    # to that <num>c<num>(@SP<something>)? format
    # if second field has a "/"
    \~^\([0-9]*\) \([0-9]*\)/\([^ ]*\)~{
        # then add a SP
        s//\2c\1@SP\3/
        b not2
    } ; {
        # otherwise just do a "c" between
        s/\([0-9]*\) \([0-9]*\)/\2c\1/
    } ; : not2
' |
sort -n -k1 |
# now it's trivial
awk '
{ 
    out[$2] = out[$2] (!length(out[$2])?"":";") $1
}

function outputit(name, idx) {
    print name
    if (length(out[idx]) == 0) {
        print "none"
    } else {
        print out[idx]
    }
    printf "\n"
}

END{
    outputit("dead", "d")
    outputit("null", "n")
    outputit("strike", "s")
}
'

outputs on repl

dead
none

null
26c5@SP10533-10541;47c1@SP13528;103c6@SP13519;114c4;152c8@SP12345;162c7;12346c8

strike
10c7;55c2;225c5

我相信输出与;分隔列表的排序顺序匹配,您似乎在第一列然后第二列进行排序,而我刚刚使用sort进行了排序。

答案 1 :(得分:1)

这是一个用于解析文件的awk脚本。

BEGIN {
    types["d"]; types["n"]; types["s"]
    deft = "n"; OFS = ","; sep = ";"
}

$1=="String" {
    gsub(/[)(]/,""); gsub(",", " ")    # general line subs
    for (i=3;i<=NF;i++) {
        if (!gsub("/","c"$2+0"@SP", $i)) $i = $i"c"$2+0    # make all subs on items
        for (t in types) { if (gsub(t, "", $i)) { x=t; break }; x=deft } #find type
        items[x] = items[x]? items[x] sep $i: $i    # append for type found
    }
}

END {
    print "dead" OFS "null" OFS "strike"
    print items["d"] OFS items["n"] OFS items["s"]
}

输入:

String 1:  47/13528 
String 2:  55(s) 
String 3:   
String 4:  114(n) 
String 5:  225(s), 26/10533-10541 
String 6:  103/13519 
String 7:  10(s), 162(n) 
String 8:  152/12345,12346
(d=dead, n=null, s=strike) 

输出:

> awk -f tst.awk file
dead,null,strike
,47c1@SP13528;114c4;26c5@SP10533-10541;103c6@SP13519;162c7;152c8@SP12345;12346c8,55c2;225c5;10c7

您的描述在重要细节上发生了变化,例如我们如何确定项目的类型或如何将它们分开,直到现在您的输入和输出与之不一致,但总的来说,我认为您可以轻松地获得完成此脚本。请记住,gsub()返回进行的替换的次数,同时也返回它们,因此多次使用它作为条件很方便。

相关问题