现在我有了处理这种文件类型的代码: cat myfile.txt
XSAP_SM1_100 COR-REV-SAPQ-P09 - 10/14/2013 -
SCHEDULE XSAP_SM1_100#COR-REV-SAPQ-P09 TIMEZONE Europe/Paris
ON RUNCYCLE RULE1 "FREQ=WEEKLY;BYDAY=WE"
EXCEPT RUNCYCLE CALENDAR2 FR1DOFF -1 DAYS
EXCEPT RUNCYCLE SIMPLE3 11/11/2011
AT 0530
:
XSAP_SM1_100#CORREVSAPQP09-01
AT 0640 TIMEZONE Europe/Paris
XSAP_SM1_100#CORREVSAPQP09-02
AT 0645 TIMEZONE Europe/Paris
代码是
awk 'BEGIN { RS=":"; FS="\n"}
NR==2 {
for(i=1;i<=NF;++i) {
if($i !~ /^$/) {
split($i,tmp,"#")
i=i+1
split($i,tmp2," ")
printf "\"%s\",\"%s\",\"%s\"\n", tmp[1],tmp[2],tmp2[2]
}
}
}'
但是我有另一种文件类型,即我将在for循环中对1000个文件执行此命令,但是我已整合,仅针对下面的类型,它没有按预期工作。
] cat testing.txt
ODSSLT_P09 COR-ODS-SMT9-B01 - 12/29/2015 -
SCHEDULE ODSSLT_P09#COR-ODS-SMT9-B01 TIMEZONE UTC
ON RUNCYCLE RULE1 "FREQ=DAILY;"
AT 0505
PRIORITY 11
:
ODSSLT_P09#CORODSSMT9001-01
UNTIL 2355 TIMEZONE Asia/Shanghai
EVERY 0100
ODSSLT_P09#CORODSSMT9001-02
AT 2355
EVERY 0100
ODSSLT_P09#CORODSSMT9001-03
ODSSLT_P09#CORODSSMT9001-04
UNTIL 2355 TIMEZONE Asia/Shanghai
EVERY 0100
EOF
此文件的预期输出:
"ODSSLT_P09","CORODSSMT9001-01",""
"ODSSLT_P09","CORODSSMT9001-02","2355"
"ODSSLT_P09","CORODSSMT9001-03",""
"ODSSLT_P09","CORODSSMT9001-04",""
代码的实际输出是
| grep -v -i -w -E
"CONFIRMED|DEADLINE|DAY|DAYS|EVERY|NEEDS|OPENS|PRIORITY|PROMPT|UNTIL|AWSBIA291I|END|FOLLOWS" |
awk 'BEGIN { RS=":"; FS="\n"}
NR==2 {for(i=1;i<=NF;++i) {
if($i !~ /^$/) {
split($i,tmp,"#")
i=i+1
split($i,tmp2," ")
printf "\"%s\",\"%s\",\"%s\"\n", tmp[1],tmp[2],tmp2[2]
}}}'
输出只是给出:
"ODSSLT_P09","CORODSSMT9001-01",""
"AT 2355","",""
"ODSSLT_P09","CORODSSMT9001-04",""
答案 0 :(得分:1)
最好的解决方案是做一切的小awk程序(awk
将循环输入,所以写一些没有while
的东西。)
由于你用ksh标记而不是bash或linux,我不相信你的awk版本
首先尝试加入线并再次拆分,除了AT。我希望没有行会有字符串EOL
,所以我将加入EOL标记。
sed 's/$/EOL/' myfile.txt |
tr -d "\n" |
sed -e 's/EOLAT/ AT/g' -e 's/EOL/\n/g'
也许您的sed
版本无法理解\n
,在这种情况下,请用真实换行符替换它。
我知道我想用sed
输出做什么,所以我会在sed之前过滤并更改sed
命令。
foundcolon="0";
grep -E "^:$|XSAP|AT" myfile.txt |
sed 's/$/EOL/' |
tr -d "\n" |
sed -e 's/EOLAT//g' -e 's/EOL/\n/g' -e 's/#/ /g' |
while read -r xsap corr numm rest_of_line; do
if [ "${foundcolon}" = "0" ]; then
if [ "${xsap}" = ":" ]; then
foundcolon="1"
fi
continue
fi
printf '"%s","%s","%s"\n' "${xsap}" "${corr}" "${numm}";
done
使用其他sed选项,sed -e '/address1/,/address2/ d'
会使其变得更加简单:
grep -E "^:$|XSAP|AT" myfile.txt |
sed 's/$/EOL/' |
tr -d "\n" |
sed -e 's/EOLAT//g' -e 's/EOL/\n/g' -e '1,/^:$/ d' -e 's/#/ /g' |
while read -r xsap corr numm rest_of_line; do
printf '"%s","%s","%s"\n' "${xsap}" "${corr}" "${numm}";
done
答案 1 :(得分:1)
这里有一个或多或少纯粹的awk
解决方案,它可以产生真正的解决方案
请求的输出文件的输出。它没有
知识领域的知识。
awk '
/^:/ { start=1; next }
! start {next}
$1 == "AT" {
split(last,a,/#/)
printf "\"%s\",\"%s\",\"%s\"\n", a[1], a[2], $2
last=""
next
}
{
last=$0
}' data