我正在尝试从特定的字符串重新排列到相应的列中。 这是输入
String 1: 47/13528
String 2: 55(s)
String 3:
String 4: 114(n)
String 5: 225(s), 26/10533-10541
String 6: 103/13519
String 7: 10(s), 162(n)
String 8: 152/12345,12346
(d=dead, n=null, s=strike)
每个值中的字母是标志(d =死,n =空,s =敲击)。 值(数字)为“字符串1”的字符串为47c 1 等等:
String 1: 47/13528
value without any flag will be sorted into the null column along with null tag (n)
String 1 (the integer will be concatenated with 47/13528)
Sorted :
null
47c1@SP13528;114c4;103c6@SP13519;162c7
Str#2: 55(s)
flagged with (s) will be sorted into strike column
Sorted :
strike
55c2;225c5;26c5@SP10533-10541;162c7
我正在尝试通过修改以前的代码来解析它,似乎没有运气
{
for (i=1; i<=NF; i++) {
num = $i+0
abbr = $i
gsub(/[^[:alpha:]]/,"",abbr)
list[abbr] = list[abbr] num " c " val ORS
}
}
END {
n = split("dead null strike",types)
for (i=1; i<=n; i++) {
name = types[i]
abbr = substr(name,1,1)
printf "name,list[abbr]\n"
}
}
预期输出(分为csv):
dead,null,strike
,47c1@SP13528;114c4; 26c5@SP10533-10541;103c6@SP13519;162c7, 152c8@SP12345;152c8@SP12346,55c2;225c5;162c7;10c7
为进行交叉检查而细分:
dead
none
null
47c1@SP13528;114c4;103c6@SP13519;162c7;152c8@SP12345;152c8@SP12346;26c5@SP10533-10541;;162c7
strike
55c2;225c5;10c7
答案 0 :(得分:1)
我通常的approuch是:
awk
中将某些列中的列累加并打印即可。以下代码:
cat <<EOF |
String 1: 47/13528
String 2: 55(s)
String 3:
String 4: 114(n)
String 5: 225(s), 26/10533-10541
String 6: 103/13519
String 7: 10(s), 162(n)
String 8: 152/12345,12346
(d=dead, n=null, s=strike)
EOF
sed '
# filter only lines with String
/^String \([0-9]*\): */!d;
# Remove the String
# Remove the : and spaces
s//\1 /
# remove trailing spaces
s/ *$//
# Remove lines with nothing
/^[0-9]* *$/d
# remove the commas and split lines on comma
# by moving them to separate lines
# repeat that until a comma is found
: a
/\([0-9]*\) \(.*\), *\(.*\)/{
s//\1 \2\n\1 \3/
ba
}
' | sed '
# we should be having two fields here
# separated by a single space
/^[^ ]* [^ ]*$/!{
s/.*/ERROR: "&"/
q1
}
# Move the name in braces to separate column
/(\(.\))$/{
s// \1/
b not
} ; {
# default is n
s/$/ n/
} ; : not
# shuffle first and second field
# to that <num>c<num>(@SP<something>)? format
# if second field has a "/"
\~^\([0-9]*\) \([0-9]*\)/\([^ ]*\)~{
# then add a SP
s//\2c\1@SP\3/
b not2
} ; {
# otherwise just do a "c" between
s/\([0-9]*\) \([0-9]*\)/\2c\1/
} ; : not2
' |
sort -n -k1 |
# now it's trivial
awk '
{
out[$2] = out[$2] (!length(out[$2])?"":";") $1
}
function outputit(name, idx) {
print name
if (length(out[idx]) == 0) {
print "none"
} else {
print out[idx]
}
printf "\n"
}
END{
outputit("dead", "d")
outputit("null", "n")
outputit("strike", "s")
}
'
dead
none
null
26c5@SP10533-10541;47c1@SP13528;103c6@SP13519;114c4;152c8@SP12345;162c7;12346c8
strike
10c7;55c2;225c5
我相信输出与;
分隔列表的排序顺序匹配,您似乎在第一列然后第二列进行排序,而我刚刚使用sort
进行了排序。
答案 1 :(得分:1)
这是一个用于解析文件的awk脚本。
BEGIN {
types["d"]; types["n"]; types["s"]
deft = "n"; OFS = ","; sep = ";"
}
$1=="String" {
gsub(/[)(]/,""); gsub(",", " ") # general line subs
for (i=3;i<=NF;i++) {
if (!gsub("/","c"$2+0"@SP", $i)) $i = $i"c"$2+0 # make all subs on items
for (t in types) { if (gsub(t, "", $i)) { x=t; break }; x=deft } #find type
items[x] = items[x]? items[x] sep $i: $i # append for type found
}
}
END {
print "dead" OFS "null" OFS "strike"
print items["d"] OFS items["n"] OFS items["s"]
}
输入:
String 1: 47/13528
String 2: 55(s)
String 3:
String 4: 114(n)
String 5: 225(s), 26/10533-10541
String 6: 103/13519
String 7: 10(s), 162(n)
String 8: 152/12345,12346
(d=dead, n=null, s=strike)
输出:
> awk -f tst.awk file
dead,null,strike
,47c1@SP13528;114c4;26c5@SP10533-10541;103c6@SP13519;162c7;152c8@SP12345;12346c8,55c2;225c5;10c7
您的描述在重要细节上发生了变化,例如我们如何确定项目的类型或如何将它们分开,直到现在您的输入和输出与之不一致,但总的来说,我认为您可以轻松地获得完成此脚本。请记住,gsub()
返回进行的替换的次数,同时也返回它们,因此多次使用它作为条件很方便。