是否有简单有效的单行解决方案来替换包含数字和符号的所有数字或序列(\ / $& *#@)( - +!〜。,:;"&#39 ;`^%_] [{} =),例如:
1 2 3 4 998898321321
0.2 1.2 32221.111. 1321321321.111
111.11212.21212
212323/12331/321312
121-12123-32131
121+12123+32131
1_212121_2320
12131!~~~323131
etc
在大文本(100GB)文件中使用单个标记NUMBER?样本输入和输出:
输入:
hello my friend 212323/12331/321312
hope you are fine 12131!~~~323131 in 33-years from now
happy face is important to maintaion by 98987 321321/32131
输出:
hello my friend NUMBER
hope you are fine NUMBER in 33-years from now
happy face is important to maintaion by NUMBER NUMBER
基本上,包含数字和非字母符号的两个空格之间的任何内容都必须由NUMBER替换。文本的其余部分应保持原样。
答案 0 :(得分:2)
好的,我想我得到了这个:
我需要三个步骤:
现在的样子:
$ cat test.txt
hello my friend 212323/12331/321312
hope you are fine 12131!~~~323131 in 33-years from now
happy face is important to maintaion by 98987 321321/32131
123 This is a line
$ sed -r 's/ / /g;s/(^| )[^[:alpha:] ]+( |$)/\1NUMBER\2/g;s/ / /g' test.txt
hello my friend NUMBER
hope you are fine NUMBER in 33-years from now
happy face is important to maintaion by NUMBER NUMBER
NUMBER This is a line
答案 1 :(得分:0)
使用perl
解决方案来补充chw21's helpful solution,该解决方案不仅可以处理空格,还可以任意混合空格和单词之间的标签:
perl -ple 's/(^|(?<=[[:blank:]]))[^[:alpha:][:blank:]]+((?=[[:blank:]])|$)/NUMBER/g' file
使用look-behind((?<=...
)和前瞻((?=...)
)断言消除了对捕获组的需求,因此需要将空间加倍作为中间步骤;使用[[:blank:]]
(空格或制表符)代替(只是空格),可以使用任何空格和制表符组合:
(^|(?<=[[:blank:]]))
匹配行的开头(^
)或任何以空格(空格或制表符)开头的字符
[^[:alpha:][:blank:]]+
匹配由非字母和非空格组成的任何非空字符
((?=[[:blank:]])|$)
在该行末尾($
)匹配,或者以下字符为空白。