使用vi或sed的非捕获模式

时间:2018-12-13 20:15:12

标签: regex sed vi

我有一个大约有100,000行的文件。我可以用vi或sed使用一个好的正则表达式将输入文件转换为输出吗?该行的管道定界部分可以包含数百个条目

总结一下需要做的事情,我需要在行的开头捕获一个表达式,然后将其附加到每个条目(即,它出现在任何管道或行末之前)

输入

G1778-BRAZIL    .A3_Alagoas|.A5_Amazonas|.B3_Bahia|.C4_Ceara|.D5_Distrito Federal|.E8_Espirito Santo|.G6_Goias|.G8_Guanabara
G2807-ATLANTIC OCEAN    .B3_Baffin Bay|.M4_Mexico, Gulf of|.N55_North Atlantic Ocean|.N6_North Sea

输出

G1778-BRAZIL    .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN    .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN

2 个答案:

答案 0 :(得分:0)

哦,我明白你现在在做什么。

perl -F'/[\s|]+/' -nE '
    BEGIN { $, = " " }
    $a = shift @F; 
    say $a, join "|", map {"$_+$a"} @F
' file

gawk -F'[[:blank:]|]+' '{
    printf "%s ", $1
    for (i=2; i<=NF; i++) printf "%s+%s%s", $i, $1, i == NF ? ORS : "|"
}' file

答案 1 :(得分:0)

idk(如果第一个长空格是制表符或多个空格),那么这将以假定捕获的字符串不包含任何反向引用元字符(例如&)的两种方式起作用:

$ awk -F'  +|\t' '{gsub(/[|]|$/,"+"$1"&")}1' file
G1778-BRAZIL    .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN    .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN