Question

我有这样的行：

ORIGINAL

sometext1 sometext2 word:A12 B34 C56 sometext3 sometext4
sometext5 sometext6 word:A123 B45 C67 sometext7 sometext8
sometext9 sometext10 anotherword:(someword1 someword2 someword3) sometext11 sometext12

EDITED

asdjfkklj lkdsjfic kdiw:A12 B34 C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123 B45 C678 oknes lkwid 
cnqule nkdal anotherword:(kdlklks inlqok mncvmnx) unqieo lksdnf

期望的输出：

asdjfkklj lkdsjfic kdiw:A12-B34-C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123-B45-C678 oknes lkwid 
cnqule nkdal anotherword:(kdlklks-inlqok-mncvmnx) unqieo lksdnf

编辑：这会更明确吗？但坦率地说，这比阅读sometext#更难以阅读和回答。我不了解别人的偏好。

我只想在字母后跟一些数字后用短划线替换空格，并用两个圆括号之间的短划线替换空格。并没有任何其他空白。非常感谢语法的解释。

谢谢！

Answer 1

此代码效果很好

darby@Debian:~/Scrivania$ cat test.txt | sed -r 's@\s+([A-Z][0-9]+)@-\1@g' | sed ':l s/\(([^ )]*\)[ ]/\1-/;tl'
asdjfkklj lkdsjfic kdiw:A12-B34-C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123-B45-C678 oknes lkwid 
cnqule nkdal anotherword:(kdlklks-inlqok-mncvmnx) unqieo lksdnf

解释我的正则表达式

在第一个正则表达式

Options

-r              Enable regex extended

Pattern

\s+             One or more space characters
([A-Z][0-9]+)   Submatch a uppercase letter and one or more digits

Replace

-              Dash character
\1             Previous submatch

Note

The g after delimiters ///g is for global substitution.

在第二个正则表达式

Pattern

:l             label branched to by t or b
tl             jump to label if any substitution has been made on the pattern space since the most recent reading of input line or execution of command 't'. If label is not specified, then jump to the end of the script. This is a conditional branch
\(([^ )]*\)    match all in round brackets and stop to first space found
[ ]            one space character

Replace

\1             Previous submatch
-              Add a dash

Answer 2

这可能适合你（GNU sed）：

sed -r ':a;s/(A[0-9]+(-[A-Z][0-9]+)*) ([A-Z][0-9]+)/\1-\3/;ta;s/(\(\S+(-\S+)*) (\S+( \S+)*\))/\1-\3/;ta' file

使用正则表达式和反向引用迭代替换所需字符串中的空格。

Answer 3

您需要使用()和第二组捕获第一个字母数字组。然后，您只需使用反向引用\1和\2替换所有内容：

使用sed两次

sed -E 's/(\b[A-Za-z][0-9]+) ([A-Z])/\1-\2/g' | sed -E 's/(\b[A-Za-z][0-9]+) ([A-Z])/\1-\2/g'

或使用perl（使用lookahead (?=...)正则表达式不捕获第二组）

perl -pe 's/(\b[A-Za-z][0-9]+) (?=[A-Z])/\1-/g'

\b工作边界
[A-Za-z] 1封信
[0-9]+ 1位或更多位

sed不支持lookahead和lookbehind功能性

sed仅在特定字符模式之间替换短划线

3 个答案: