我完成了我的研究,但无法找到问题的解决方案。 我试图在字符串中提取所有有效的单词(以字母开头)并用下划线(“_”)连接它们。我正在寻找awk,sed或grep等解决方案。
类似的东西:
echo "The string under consideration" | (awk/grep/sed) (pattern match)
示例1
输入:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
期望的输出:
L2_Traffic_house_seen_during_ABCD_from
示例2
输入:
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
期望的输出:
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
示例3
输入:
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
期望的输出:
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
答案 0 :(得分:2)
这可能适合你(GNU sed):
sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file
答案 1 :(得分:1)
perl
单行。它搜索任何字母字符,后跟字边界中包含的任意数量的字符。使用/g
标志为每一行尝试多个匹配。
infile
的内容:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Perl
命令:
perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile
输出:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
答案 2 :(得分:1)
使用awk
的一种方式,内容为script.awk
:
BEGIN {
FS="[^[:alnum:]_]"
}
{
for (i=1; i<=NF; i++) {
if ($i !~ /^[0-9]/ && $i != "") {
if (i < NF) {
printf "%s_", $i
}
else {
print $i
}
}
}
}
运行如:
awk -f script.awk file.txt
或者,这是一个班轮:
awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt
结果:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
答案 3 :(得分:0)
这个解决方案需要一些调整,我认为需要gawk才能将regexp作为“记录分隔符”
http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records
gawk -v ORS ='_'-v RS ='[ - :\“()]''/ ^ [a-zA-Z] /'file.dat