从texttable中提取URL(多行)

时间:2015-09-26 11:03:03

标签: bash shell

我的来源:

+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| positives | total |      scan_date       |                                       url                                        |
+===========+=======+======================+==================================================================================+
|     4     |  65   | 2015-09-21 23:29:33  | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
|           |       |                      | prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt                     |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
|     1     |  64   | 2015-09-17 19:28:50  | http://thebackpack.fr/                                                           |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
|     1     |  64   | 2015-09-17 08:44:16  | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
|           |       |                      | prettyphoto/images/prettyPhoto/light_rounded/                                    |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+

我想提取完整的网址(一行中的完整网址):

hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
hxxp://thebackpack.fr/
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/

多行网址是我的问题。我试过例如:awk '{print $9}'

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:3)

您可以使用此awk命令:

awk -F '[[:blank:]]*\\|[[:blank:]]*' 'NR<3 || NF<5{next}
   $2{if (url) print url; url=$5; next}
   {url=url $5}
   END{print url}' file

<强>输出:

http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
http://thebackpack.fr/
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/