从文本

时间:2017-05-09 07:54:42

标签: python regex python-3.x

我有以下需要从网页抓取的网址。 This是我的尝试,但是如果它有=,它就会跳过字符串的一部分。下面给出了不同类型的URL:

  

AjaxRender.htm encparams = 2〜2586506573108327708〜9SpSI_aPBiryk3VIKwmkjN-FD4jkS5GoDobsBCN6pRnZOhBsmrEOgT8vg5KciKjOmt25k3kEDZ00r7f48bIsRPSZWTHJbSCpS815cCNyQrsobBzLZlao8ww-rWwg0lLDIb10gJ3vWUl3zLIAQi5vBGLglJKXcSEg7wCXZUEm5aVHCQiGChz5f8oeiBtPXAV_A9XQ7xU5HUzyzTzyEMJICw ==&安培; rwebid = 8347449&安培; rhost的= 1

     

AjaxRender.htm encparams = 7〜502085760588479881〜-8dtDO_8-jpTBfqALerDcDLkIIRWnom8BG9WdtIVgqGTlRDn37waNvbaM_VHLrcntsGabZPzMiFlNxsrmqx4VpCZrtJmjyCcOBr9AY1B2GxnTlh3ngYfIYbhnDi-W6Hpb8V77OSS-WviMKsgF87gcWvjGzEd02a7Q_3XQ2FvdZ2rvwDlwG4izypuO7Ob63Gh&安培; rwebid = 8347449&安培; rhost的= 1

     

AjaxRender.htm encparams = 5〜6917684668841666406〜70K0Ijfg4OhPeKlzLP8aQV4JjBq9WuXnpC3enGYXfyWj5-28RyHRnjGRJypZBi0knr3io-9UdjdlOWuLqisI_pkZ0hQzFA5bhlRkX7siC6uMUA6A_MntiLDNGTrKN47TvrAxRd_JpQQUprReVHYwSdUEQvVUtpKn1_Ku5WG_zyWe_0Sd7FLftU1ti6pYf_tfMyNiDalQzyrPDQ35sAXcYIDyhSYI08uZCmTq5vrjSNkQChnMSW73MKri42rVM3JVP8j5LfCf3Zrws54M8KkFRnvfsyYeYd-hATgywsv9i2rtU3A-KPP6lSrL6jqbkAXVTezFRYV00ZNUhvX8NrL8Ew ==&安培; rwebid = 8347449&安培; rhost的= 1

     

AjaxRender.htm encparams = 2〜3180584448022130058〜v_d5bPfBCJINSmPxaUaByy3S5n5h5UbQ53k5QKhqYbz7KXeHku95HjcqE2MnU18rRhcdnBghBW90u-GS3tqZc5FBGt6Z9-mNBnr6RPTiAlIdlG9vO8QDW7e7vMS5H2Yue3sRQ5ANzNKGoAXe3Z5GpC1HWW9DA55OGRkLRsGdNRbN3VkqiVpObCQNGHyDYhfrh_WF8uPpAb5WE2s9sDvrSVDkUfuvHclAarXoua9OYsDQtYaDGxaquDkZrIO-VEYgjv-CPKwCkOQyOVqdq - QQ-GQNvi8vHk05uoiU6-9Kg4 =安培; rwebid = 8347449&安培; rhost的= 1

     ?

AjaxRender.htm encparams = 5〜7279828915011575224〜9KhHzCPV9FXMYfGNPF7W0MNL_4Ljv3YFdCr_JVtQN0GhUhD7ohGtUTYCzRJvS4sI6uyoM3TTrNmHaMsidk_BiN9qXRKpdEhJHGgfHHLzU1vtAXejIwnQUxB5Oexjkt74WeBnEfVSrxVfvhRM3LoB076SYiK5x92bA8WqJg62YtsUWV7vqtsCpvKyn9ssF7nnjlTmUqIWpBkqC9ZtcfN7-A ==&安培; rwebid = 8347449&安培; rhost的= 1

     

AjaxRender.htm encparams = 7〜502085760588479881〜-8dtDO_8-jpTBfqALerDcDLkIIRWnom8BG9WdtIVgqGTlRDn37waNvbaM_VHLrcntsGabZPzMiFlNxsrmqx4VpCZrtJmjyCcOBr9AY1B2GxnTlh3ngYfIYbhnDi-W6Hpb8V77OSS-WviMKsgF87gcWvjGzEd02a7Q_3XQ2FvdZ2rvwDlwG4izypuO7Ob63Gh&安培; rwebid = 8347449&安培; rhost的= 1

     ?

AjaxRender.htm encparams = 3〜4781276045400603393〜duZpRpWJA0naDjmpXNSp__ILjEXoOrwiv9SVBUjldBK4ebRdYWlzxwRudeyHrXoCC-XM_xEKr475_ViwwaHlnqFgEqteM3N6bDAgOxWEc8Y5Klh5d3Ivb_6qG6VsfMmp8oaT3nLnuALjX8vfqBN72WsNlwWeGMR3lOTuQnHgbl2betlejT6KsRx7ycVv71mxe8BP7oDIdI29Baetjlv1YA ==&安培; rwebid = 8347449&安培; rhost的= 1

     

AjaxRender.htm encparams = 6〜7112793196313446100〜IVBMr0jpuDOH9HKclY47FtAJQXrgqOsD6P7mbOwJOcbDWAbviVmEg1HZScYqiKL5svd6BGA7jm22V6uEvquNb_-cZEyfDIFGbNxF3WNTwXcGX13GWcVi6tg7Acgdw8SHEEvhJzw1U01lvMS-Ptks6eeWj0cDdM_Al9hS5WkUA4ZR7rQK5CU9Uovn9WWF5I-6Ot0zcXZKaJMNIndiPYdIq0rpcpehlB8k&安培; rwebid = 8347449&安培; rhost的= 1

     

AjaxRender.htm encparams = 10〜1438958542856547329〜OUrqnIrSPt0QON_7Q12RhcKfwyc22cFvE0xIIobEoUIFu91yWu5SK_jSW59wazXcfcxjpZnQ9YTWAH5kxu8H2B-lu2vO9J47cqg9ThA6AvDFRhj-6moF1_6ymrCKqhbcJdQddN24hShw9IwJOs2uDYJ2bECVJlnoraak4PGtBLHV4TnoVy9eZJxVPNB3XbIumIivk84XZyg =安培; rwebid = 8347449&安培; rhost的= 1

我在&rwebid之前遇到的问题是-=偶尔会遇到(基数为64?)这会破坏事情。

更新 https://regex101.com/r/pudx92/2

1 个答案:

答案 0 :(得分:1)

为什么不停止字符串分隔符"

AjaxRender[.]htm[?]encparams=[^\"]*