打印有_之间的有效单词

时间:2012-07-19 09:31:20

标签: sed awk grep word concatenation

我完成了我的研究,但无法找到问题的解决方案。 我试图在字符串中提取所有有效的单词(以字母开头)并用下划线(“_”)连接它们。我正在寻找awk,sed或grep等解决方案。

类似的东西:

echo "The string under consideration" | (awk/grep/sed) (pattern match)

示例1

输入:

1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11

期望的输出:

L2_Traffic_house_seen_during_ABCD_from

示例2

输入:

XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi

期望的输出:

XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi

示例3

输入:

ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE

期望的输出:

ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE

4 个答案:

答案 0 :(得分:2)

这可能适合你(GNU sed):

sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file

答案 1 :(得分:1)

perl单行。它搜索任何字母字符,后跟字边界中包含的任意数量的字符。使用/g标志为每一行尝试多个匹配。

infile的内容:

1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE

Perl命令:

perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile

输出:

L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE

答案 2 :(得分:1)

使用awk的一种方式,内容为script.awk

BEGIN {
    FS="[^[:alnum:]_]"
}

{
    for (i=1; i<=NF; i++) {
        if ($i !~ /^[0-9]/ && $i != "") {
            if (i < NF) {
                printf "%s_", $i
            }
            else {
                print $i
            }
        }
    }
}

运行如:

awk -f script.awk file.txt

或者,这是一个班轮:

awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt

结果:

L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE

答案 3 :(得分:0)

这个解决方案需要一些调整,我认为需要gawk才能将regexp作为“记录分隔符” http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records
    gawk -v ORS ='_'-v RS ='[ - :\“()]''/ ^ [a-zA-Z] /'file.dat