我是正则表达式匹配的新手。假设我想在用逗号分隔的文本文件中找到所有URL,并用单词“url”替换它们。
user,user,' http://twitpic.com/2y1zl - awww, that\'s a bummer. you shoulda got david carr of third day to do it. ;d',0
user,user,'is upset that he can\'t update his facebook by texting it... and might cry as a result school today also. blah!',0
user,user,' i dived many times for the ball. http://twitpic.com/2y1zl managed to save 50\% the rest go out of bounds',0
user,user,'my whole body feels itchy and like its on fire ',0
user,user,' no, it\'s not behaving at all. i\'m mad. why am i here? because i can\'t see you all over there. ',0
user,user,' not the whole crew ',0
user,user,'need a hug ',0
user,user,' hey long time no see! yes.. rains a bit ,only a bit lol , i\'m fine thanks , how\'s you ?',0
user,user,'_k nope they didn\'t have it ',0
user,user,'que me muera ? ',0
user,user,'spring break in plain city... it\'s snowing ',0
user,user,'i just re-pierced my ears ',0
希望以这种方式实现输出
user,user,' *url*- awww, that\'s a bummer. you shoulda got david carr of third day to do it. ;d',0
user,user,'is upset that he can\'t update his facebook by texting it... and might cry as a result school today also. blah!',0
user,user,' i dived many times for the ball. *url* managed to save 50\% the rest go out of bounds',0
user,user,'my whole body feels itchy and like its on fire ',0
user,user,' no, it\'s not behaving at all. i\'m mad. why am i here? because i can\'t see you all over there. ',0
user,user,' not the whole crew ',0
user,user,'need a hug ',0
user,user,' hey long time no see! yes.. rains a bit ,only a bit lol , i\'m fine thanks , how\'s you ?',0
user,user,'nope they didn\'t have it ',0
user,user,'que me muera ? ',0
user,user,'spring break in plain city... it\'s snowing ',0
user,user,'i just re-pierced my ears ',0
我试过sed
sed -e 's/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$//URL/' filename.txt |less
查找和替换正则表达式不起作用
答案 0 :(得分:0)
默认的GNU sed正则表达式需要大量的反斜杠(ref:https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html#Regular-expression-syntaxes)。此外,sed正则表达式不理解perl \d
和\w
。
匹配网址是一个非常难的问题。从
开始sed 's@https\?://[^[:blank:]]\+@*url*@g' file
这为s///
命令使用了一个备用分隔符,以避免需要转义斜杠。
答案 1 :(得分:0)
如果您的网址与空格后面的任何内容或网址中不存在的任何内容分开,则此操作应该有效。
我在这里没有处理非http网址或用户/密码组合;只需一个http / https后跟一系列字符,允许在URL中使用。
sed -e 's@https\?://[][0-9a-Z._~:/?#@!$&()*+,;=%'\''-]\+@URL@g'
@
作为分隔符,以便于处理斜杠。'\''