我有这样的行:
ORIGINAL
sometext1 sometext2 word:A12 B34 C56 sometext3 sometext4
sometext5 sometext6 word:A123 B45 C67 sometext7 sometext8
sometext9 sometext10 anotherword:(someword1 someword2 someword3) sometext11 sometext12
EDITED
asdjfkklj lkdsjfic kdiw:A12 B34 C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123 B45 C678 oknes lkwid
cnqule nkdal anotherword:(kdlklks inlqok mncvmnx) unqieo lksdnf
期望的输出:
asdjfkklj lkdsjfic kdiw:A12-B34-C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123-B45-C678 oknes lkwid
cnqule nkdal anotherword:(kdlklks-inlqok-mncvmnx) unqieo lksdnf
编辑:这会更明确吗?但坦率地说,这比阅读sometext#
更难以阅读和回答。我不了解别人的偏好。
我只想在字母后跟一些数字后用短划线替换空格,并用两个圆括号之间的短划线替换空格。并没有任何其他空白。非常感谢语法的解释。
谢谢!
答案 0 :(得分:1)
此代码效果很好
darby@Debian:~/Scrivania$ cat test.txt | sed -r 's@\s+([A-Z][0-9]+)@-\1@g' | sed ':l s/\(([^ )]*\)[ ]/\1-/;tl'
asdjfkklj lkdsjfic kdiw:A12-B34-C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123-B45-C678 oknes lkwid
cnqule nkdal anotherword:(kdlklks-inlqok-mncvmnx) unqieo lksdnf
解释我的正则表达式
在第一个正则表达式
Options
-r Enable regex extended
Pattern
\s+ One or more space characters
([A-Z][0-9]+) Submatch a uppercase letter and one or more digits
Replace
- Dash character
\1 Previous submatch
Note
The g after delimiters ///g is for global substitution.
在第二个正则表达式
Pattern
:l label branched to by t or b
tl jump to label if any substitution has been made on the pattern space since the most recent reading of input line or execution of command 't'. If label is not specified, then jump to the end of the script. This is a conditional branch
\(([^ )]*\) match all in round brackets and stop to first space found
[ ] one space character
Replace
\1 Previous submatch
- Add a dash
答案 1 :(得分:1)
这可能适合你(GNU sed):
sed -r ':a;s/(A[0-9]+(-[A-Z][0-9]+)*) ([A-Z][0-9]+)/\1-\3/;ta;s/(\(\S+(-\S+)*) (\S+( \S+)*\))/\1-\3/;ta' file
使用正则表达式和反向引用迭代替换所需字符串中的空格。
答案 2 :(得分:0)
您需要使用()
和第二组捕获第一个字母数字组。然后,您只需使用反向引用\1
和\2
替换所有内容:
使用sed两次
sed -E 's/(\b[A-Za-z][0-9]+) ([A-Z])/\1-\2/g' | sed -E 's/(\b[A-Za-z][0-9]+) ([A-Z])/\1-\2/g'
或使用perl(使用lookahead
(?=...)
正则表达式不捕获第二组)
perl -pe 's/(\b[A-Za-z][0-9]+) (?=[A-Z])/\1-/g'
\b
工作边界
[A-Za-z]
1封信
[0-9]+
1位或更多位
sed不支持lookahead
和lookbehind
功能性