为什么我的正则表达式不能正确解析这个http请求?

时间:2017-03-30 17:11:53

标签: python regex request python-requests

我有以下http请求有效负载。

X-gmsv=9879480&X-subtype=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-X-subscription=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-gcm.topic=%2Ftopics%2Fphenotype_com.google.android.gms.icing%25servingVersion&X-X-subtype=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-app_ver=9879480&X-kid=%7CID%7C7%7C&X-osv=25&X-sig=vWteecmhHl5q2AsrHaGcOOgaF956SpVk6KAdjijNyeX1uADvPgpgvMkNH-Nu-N8IHc-1Z1ujTytjkQDPZot4zjf_FLSjR0ucPIkFXkZhrRi5RU6uFq-ZlQCEBSPpYuHsx27lC5H3xv-TNe_zC0PaX8h8bTqrImtArVSZjMY6-RFG9TUEj2VkCvs1ixAK21vHxE4ladiXALZO-lhZIvbDIGkY4c-fUMaMBN8EhMr1zH31N41S6cUItkPRe0lTOB4YddkrS2FNRI_LZGfW-cc9h9om-80MskZD0IyJtM4AFsumHxVIQQJwSScASSoFd7e7tANTp5ZPJi2hwr6wQqpveQ&X-cliv=iid-9879000&X-gmsv=9879480&X-pub2=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA9VWXXfudfnoAAl-u_RbBClmI6uvaOH8AFEMvjrtOpL1FuLrUYQzdntRwlMyiL4Nba7WUGeb6CrkEAbwTFcR689QYQ87ytkyY65rD2InSUD3eMLWpiaTciFj-n5sUK6hyci5Je5T8Svgsb-VHSy6vWVKQZ4vGsiGqmkj8sDhCa1UbltWOyhywfG95ENiGKuO_ec55Rmvrew9tFNGIit7FzcNiEAmfSrkEifK6dydjnpahu3lAx4U_MTw5Yo0ou5EGrsByXY2P_tkWg78hq1E_SQORk7q7droAY_wupXHlqSwGCAfbGtRs2gXM-64MSZ1iQX7N7pPojkT4akomcyP2JQIDAQAB&X-X-kid=%7CID%7C7%7C&X-appid=cIIP-1V_bTg&X-scope=%2Ftopics%2Fphenotype_com.google.android.gms.icing%25servingVersion&X-subscription=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-app_ver_name=9.8.79+%28480-137224771%29&app=com.google.android.gms&sender=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&device=4374365252386389758&cert=58e1c4133f7441ec3d2c270270a14802da47ba0e&app_ver=9879480&info=AoehgPKryS4XzDpwLBRWN3IuplGtswI&gcm_ver=987948

我想拉出所有<key>=<value>对。例如,第一个键值对是X-gmsv=9879480

我提出的最好的正则表达式是.*?\=.*?&,但除了最后一个键值对之外它都是一切,因为在最后一个变量的末尾没有符号。所以我尝试.*?\=.*?[&|$]理论上应该匹配以&符号结尾的键值对或字符串的结尾。

它没有得到最后一个字符串。我玩了几个其他的正则表达式,无法弄清楚发生了什么。

有什么想法吗?

2 个答案:

答案 0 :(得分:3)

如果你坚持正则表达式......就在这里。

.*?\=.*?(?:&|$)

它捕获了24个组。和

len(input.split('&')) 

也是24岁。

答案 1 :(得分:3)

我强烈建议不要使用正则表达式。使用stdlib urlparse.parse_qs()函数。它还将处理URL编码等等:

>>> import urlparse
>>> urlparse.parse_qs('X-gmsv=9879480&X-subtype=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-X-subscription=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-gcm.topic=%2Ftopics%2Fphenotype_com.google.android.gms.icing%25servingVersion&X-X-subtype=cIIP-1V_bTg%3AAPA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW&X-app_ver=9879480&X-kid=%7CID%7C7%7C&X-osv=25')
{'X-subtype': ['cIIP-1V_bTg:APA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW'], 'X-osv': ['25'], 'X-X-subscription': ['cIIP-1V_bTg:APA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW'], 'X-kid': ['|ID|7|'], 'X-app_ver': ['9879480'], 'X-gmsv': ['9879480'], 'X-X-subtype': ['cIIP-1V_bTg:APA91bG-C3lFgSEzXCnuaLgpa4oJ0mI3NRk8Yv03NBOTfARjfBMWGhwy9J3d2dKUtGZHt6IKFmt7BBWRrQBqbvPoMobfZ2DP1Za0EyDzBqtfTLz9j-EYUHU1PWVjM2kMnOtIuA1s4EHW'], 'X-gcm.topic': ['/topics/phenotype_com.google.android.gms.icing%servingVersion']}

注意:在Python 3中,这是urllib.parse.parse_qs()