Question

我的来源：

+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| positives | total |      scan_date       |                                       url                                        |
+===========+=======+======================+==================================================================================+
|     4     |  65   | 2015-09-21 23:29:33  | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
|           |       |                      | prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt                     |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
|     1     |  64   | 2015-09-17 19:28:50  | http://thebackpack.fr/                                                           |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
|     1     |  64   | 2015-09-17 08:44:16  | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
|           |       |                      | prettyphoto/images/prettyPhoto/light_rounded/                                    |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+

我想提取完整的网址（一行中的完整网址）：

hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
hxxp://thebackpack.fr/
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/

多行网址是我的问题。我试过例如：awk '{print $9}'

提前感谢您的帮助！

Answer 1

您可以使用此awk命令：

awk -F '[[:blank:]]*\\|[[:blank:]]*' 'NR<3 || NF<5{next}
   $2{if (url) print url; url=$5; next}
   {url=url $5}
   END{print url}' file

<强>输出：

http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
http://thebackpack.fr/
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/

从texttable中提取URL（多行）

1 个答案: