使用unix awk / sed / grep从文本文件中删除完整的URL

时间:2015-06-01 17:10:23

标签: bash unix awk sed grep

我有一个文本文件,以推文的形式,我有问题删除完整的网址。文本文件的一个示例:

index.html

this is a tweet that has info. http://google.com
this is a tweet that has an image. pic.twitter.com/a2y4H1b2Jq

我想创建一个只有:

的新文件
this is a tweet that has info.
this is a tweet that has an image.

现在我正在使用grep,我有

grep -oP "http://\K[^']+" final.txt

谢谢!

2 个答案:

答案 0 :(得分:1)

sed 's/http[^ ]*//g' YourFile  

[^] *正在捕捉所有非空白字符

答案 1 :(得分:1)

取决于你想要它的限制程度。

以HTTP开头且包含分隔符的完整网址:

Warning: At least one delay-load dependency module was not found.
Warning: At least one module has an unresolved import due to a missing  
export function in a delay-load dependent module.

任何带有任何分隔符的圆点:

sed -e 's|\bhttp[^ ]*\.[^ ]*\b||g' test.html