从字符串中删除不需要的键值对

时间:2018-03-09 03:43:36

标签: python parsing

所以我有以下字符串:

__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; expires=Sat, 09-Mar-19 03:35:03 GMT; path=/; domain=.coinmarketcap.com; HttpOnly, _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; path=/, __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974; domain=.coinmarketcap.com path=/

但是我需要从中删除垃圾,比如

expires=Sat, 09-Mar-19 03:35:03 GMT

domain=.coinmarketcap.com path=/

所以我只剩下三个值:

__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974

4 个答案:

答案 0 :(得分:0)

指定要保留的键:

In [193]: keys = ['__cfduid', '_version', '__session']

现在,请先致电re.findallimport re):

In [194]: ' '.join(re.findall(r'(?:{}).*?;'.format('|'.join(keys)), text)
Out[194]: '__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974;'

正则表达式(?:{}).*?;指定您只想查找这些选定键的键值对。其他一切都被丢弃了。只要您的字符串具有一致的结构((key=value;)+)。

,就可以正常工作

答案 1 :(得分:0)

对于任何以下划线开头的键,这是更通用的解决方案。

import re
str_list = re.findall(r"_\w+=\w+", your_string)

out:
    ['__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503',
     '_version=a90f44e909c03fdad3caed1ec676a98472deb0f6',
     '__session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ']

re.findall会返回list,您可以加入 "; ".join(str_list) 以获得所需的输出。

<%= stylesheet_link_tag stylesheet_path() %>

答案 2 :(得分:0)

另一种方法,

keys = ('__cfduid', '_version', '__session')
' '.join([x for x in text.split() if x.startswith(keys)])

答案 3 :(得分:0)

看起来你正在解析一个cookie字符串。在这种情况下,您应该使用标准库cookie解析模块 - https://docs.python.org/2/library/cookie.html#Cookie.BaseCookie.load

>>> from Cookie import SimpleCookie
>>> s = SimpleCookie()
>>> s.load("__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; expires=Sat, 09-Mar-19 03:35:03 GMT; path=/; domain=.coinmarketcap.com; HttpOnly, _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; path=/, __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974; domain=.coinmarketcap.com path=/")
>>> [(k, s[k].value) for k in s.keys()]
[('__cfduid', 'dc3c9f85f65d39a5947d5f4850618237f1520566503'),
 ('_version', 'a90f44e909c03fdad3caed1ec676a98472deb0f6'),
 ('__session', 'NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974')]

>>> s['__cfduid'].value
'dc3c9f85f65d39a5947d5f4850618237f1520566503'

(Python 2,Python 3有不同的导入)。

这比尝试自己的cookie解析要好得多。