我有一个包含以下随机主机的大型电子邮件文件:
......
HOSTS: test-host,host2.domain.com,
host3.domain.com,another-testing-host,host.domain.
com,host.anotherdomain.net,host2.anotherdomain.net,
another-local-host, TEST-HOST
DATE: August 11 2015 9:00
.......
主机总是用逗号分隔,但它们可以分成一行,两行或多行(我无法控制它,不幸的是,这是电子邮件客户端所做的事情)。
所以我需要提取字符串" HOSTS:"之间的所有文本。和字符串" DATE:&#34 ;,包装它,并用新行替换逗号,如下所示:
test-host
host2.domain.com
host3.domain.com
another-testing-host
host.domain.com
host.anotherdomain.net
host2.anotherdomain.net
another-local-host
TEST-HOST
到目前为止,我想到了这一点,但是我失去了与#34; HOSTS"
相同的所有内容。sed '/HOST/,/DATE/!d;//d' ${file} | tr -d '\n' | sed -E "s/,\s*/\n/g"
答案 0 :(得分:7)
这样的事可能适合你:
sed -n '/HOSTS:/{:a;N;/DATE/!ba;s/[[:space:]]//g;s/,/\n/g;s/.*HOSTS:\|DATE.*//g;p}' "$file"
故障:
-n # Disable printing
/HOSTS:/ { # Match line containing literal HOSTS:
:a; # Label used for branching (goto)
N; # Added next line to pattern space
/DATE/!ba # As long as literal DATE is not matched goto :a
s/.*HOSTS:\|DATE.*//g; # Remove everything in front of and including literal HOSTS:
# and remove everything behind and including literal DATE
s/[[:space:]]//g; # Replace spaces and newlines with nothing
s/,/\n/g; # Replace comma with newline
p # Print pattern space
}
答案 1 :(得分:2)
awk -v RS='HOSTS: *|DATE:' 'NR==2{gsub(/\n/,"");gsub(/,/,"\n");print}' input
答案 2 :(得分:2)
其他awk
tr
$ awk '/^HOSTS:/{$1="";p=1} /^DATE:/{p=0} p' file | tr -d ' \n' | tr ',' '\n'; echo ""
test-host
host2.domain.com
host3.domain.com
another-testing-host
host.domain.com
host.anotherdomain.net
host2.anotherdomain.net
another-local-host
TEST-HOST
答案 3 :(得分:2)
这是另一个可能适合你的sed脚本:
<强> script.sed 强>
/HOSTS:/,/DATE/ {
/DATE/! H; # append to HOLD space
/DATE/ { g; # exchange HOLD and PATTERN space
s/([\n ])|(HOSTS:)//g; # remove unwanted strings
s/,/\n/g; # replace comma with newline
p; # print
}
}
以这种方式使用:sed -nrf script.sed yourfile
。
中间块应用于HOSTS:
和DATE
之间的行。在匹配DATE
的中间块行中会附加到保留空间,匹配DATE
的行会触发更长的操作。
答案 4 :(得分:1)
Perl救援!
perl -ne '
if (my $l = (/^HOSTS:/ .. /^DATE:/)) {
chomp;
s/^HOSTS:\s+// if 1 == $l;
s/DATE:.*// if $l =~ /E/;
s/,\s*/\n/g;
print;
}' input-file > output-file
触发器操作符..
返回一个数字,在这种情况下表示当前块中的行号。因此,我们可以轻松地从第一行(HOSTS:
)中删除1 == $l
。最后一行可以通过附加到号码的E0
识别,这就是我们删除DATE:...
答案 5 :(得分:1)
cat ${file} | awk 'BEGIN {A=0;} /^HOST/ {A=1;} /^DATE/ {A=0} {if (A==1) print;}' | tr -d '\n' | sed -E "s/,\s*/\n/g" | sed -e 's/^HOSTS\s*://\s*//
答案 6 :(得分:1)
awk 'sub(/^HOSTS: /,""){rec=""} /^DATE/{gsub(/ *, */,"\n",rec); print rec; exit} {rec = rec $0}' file
test-host
host2.domain.com
host3.domain.com
another-testing-host
host.domain.com
host.anotherdomain.net
host2.anotherdomain.net
another-local-host
TEST-HOST