我的来源:
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| positives | total | scan_date | url |
+===========+=======+======================+==================================================================================+
| 4 | 65 | 2015-09-21 23:29:33 | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
| | | | prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| 1 | 64 | 2015-09-17 19:28:50 | http://thebackpack.fr/ |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| 1 | 64 | 2015-09-17 08:44:16 | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
| | | | prettyphoto/images/prettyPhoto/light_rounded/ |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
我想提取完整的网址(一行中的完整网址):
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
hxxp://thebackpack.fr/
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/
多行网址是我的问题。我试过例如:awk '{print $9}'
提前感谢您的帮助!
答案 0 :(得分:3)
您可以使用此awk命令:
awk -F '[[:blank:]]*\\|[[:blank:]]*' 'NR<3 || NF<5{next}
$2{if (url) print url; url=$5; next}
{url=url $5}
END{print url}' file
<强>输出:强>
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
http://thebackpack.fr/
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/