如何解析具有相同键的多个字典值

时间:2019-07-17 17:37:14

标签: python json dictionary parsing abstract-syntax-tree

我有很多数据行(我无法手动修改),这些数据以字典形式表示为键/值对。问题是有一个字典键可以以不同的值出现多次(对于未定义的数字:可能是两次,三次,10次等)。

我需要提取所有这些值。

这是一条简单记录,其中包含键Key-Word的两个值:

  

{“日期”:“星期五,2019年4月19日格林尼治标准时间”,“不同”:   “主机,接受编码”,“关键字”:“ 00a”,“缓存控件”:“私有”,   “关键字”:“ xn”}

我写了这个python脚本来提取记录的值。

import ast
import re
import json


inFile = open("sample.txt","r",errors="replace") 


cP=0 # key found flag
cV=0 # hold the key's value


try:
    myDict = {"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}
    smallmyDict= {}

except (ValueError, SyntaxError) as E:
    cV="error"
except Exception as E:
    cV="error"

# convert the header's key to small letter
for key, value in myDict.items():
    smallmyDict[key.lower()] = value

# store all keys
smallmyDictKeys =smallmyDict.keys()



# search for a specific key
if 'key-word' in smallmyDictKeys: 
    cP=1
    cV = smallmyDict['key-word']
    print("Found!")
    print(cV) #print the key's value
else:
    print("NOT Found!")

我得到的输出是:

  

找到了! xn

问题在于它仅打印最后一个键的值。

如果它多次出现,如何使我的代码遍历正在寻找的键,并分别打印每个值而不是用最后一个值覆盖?

5 个答案:

答案 0 :(得分:1)

您可以使用json解析数据,并使用json.loadsobject_pairs_hook参数来个性化数据处理。在下面的示例中,我将列表中相同键的不同值分组(并按照您的注释的要求,将它们串联在字符串中):

import json
from collections import Counter, defaultdict

data = """{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}

"""

def duplicate_keys(pairs):
    out = {}
    dups = defaultdict(list)
    key_count = Counter(key for key, value in pairs)

    for key, value in pairs:
        if key_count[key] == 1:
            out[key] = value
        else:
            dups[key].append(value)

    # Concatenate the lists of values in a string, enclosed in {} and separated by ';'
    # rather than in a list:       
    dups = {key: ';'.join('{' + v + '}' for v in values) for key, values in dups.items()}

    out.update(dups)
    return out

decoded = json.loads(data, object_pairs_hook=duplicate_keys)
print(decoded)

# {'Date': 'Fri, 19 Apr 2019 00:54:46 GMT', 
#  'Vary': 'Host,Accept-Encoding', 
#  'Cache-Control': 'private', 
#  'Key-Word': '{00a};{xn}'}

答案 1 :(得分:0)

词典中不能有两个同名键。一个将覆盖另一个。在运行时,该密钥只​​有一对(最后一个条目)。

https://www.python-course.eu/dictionaries.php-是阅读字典的好资源。

答案 2 :(得分:0)

您可以解析字符串并将值存储在字典中作为列表:

import ast
from pprint import pprint

def parse_dict_multikey(s):
    p = ast.parse(s)
    exp_dict = p.body[0].value
    keys = list(map(ast.literal_eval, exp_dict.keys))
    values = list(map(ast.literal_eval, exp_dict.values))
    d = {}
    for k, v in zip(keys, values):
        d.setdefault(k, []).append(v)
    return d

s = ('{"Date": "Fri, 19 Apr 2019 00:54:46 GMT",'
     ' "Vary": "Host,Accept-Encoding",'
     ' "Key-Word": "00a",'
     ' "Cache-Control": "private",'
     ' "Key-Word": "xn"}')
pprint(parse_dict_multikey(s))
# {'Cache-Control': ['private'],
#  'Date': ['Fri, 19 Apr 2019 00:54:46 GMT'],
#  'Key-Word': ['00a', 'xn'],
#  'Vary': ['Host,Accept-Encoding']}

但是,这不仅将每个具有重复键的值都包含在列表中。如Thierry Lathuille所示,如果您使用Counter,则可以避免这种情况:

def parse_dict_multikey(s):
    p = ast.parse(s)
    exp_dict = p.body[0].value
    keys = list(map(ast.literal_eval, exp_dict.keys))
    values = list(map(ast.literal_eval, exp_dict.values))
    c = Counter(keys)
    d = {}
    for k, v in zip(keys, values):
        if c[k] > 1:
            d.setdefault(k, []).append(v)
        else:
            d[k] = v
    return d

哪个会给你:

{'Cache-Control': 'private',
 'Date': 'Fri, 19 Apr 2019 00:54:46 GMT',
 'Key-Word': ['00a', 'xn'],
 'Vary': 'Host,Accept-Encoding'}

您还可以研究更高级的内容,例如multidict

答案 3 :(得分:0)

由于键重复,您的数据无法直接加载到json中,请尝试以下操作:

from collections import defaultdict

string = '{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}'

pieces = string.split('",')

for each_piece in pieces:
    key, value = each_piece.split(':', maxsplit=1)
    actual_key = key.strip('{"')
    actual_value = value.strip(' "')
    data[actual_key].append(actual_value)

print(data)

输出

defaultdict(list,
            {' "Cache-Control': ['private'],
             ' "Key-Word': ['00a', 'xn"}'],
             ' "Vary': ['Host,Accept-Encoding'],
             'Date': ['Fri, 19 Apr 2019 00:54:46 GMT']})

答案 4 :(得分:0)

定义字典myDict = {"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}时,需要使用不同的键值:00axn

您可以使用/转换为字符串some_str = '{"Date": "Fri, 19 Apr 2019 00:54:46 GMT", "Vary": "Host,Accept-Encoding", "Key-Word": "00a", "Cache-Control": "private", "Key-Word": "xn"}'