我查看了以下几个资源:Remove python dict item from nested json file但似乎无法使我的代码正常工作。根据我对下面的JSON的理解(这是一个WAY较长转储的可变占位符),它是一个带有字典的dict,里面有一个dict,其中随机列出....我最终想看到的是以下打印输出到我的终端:
Message: [Client ID]
Link: "http://linkgoeshere.com"
这是我到目前为止所拥有的:
ThreeLine= {u'hits': {u'hits': [{u'_id': u'THIS IS THE FIRST ONE',
u'_index': u'foo',
u'_score': None,
u'_source': {u'@timestamp': u'2015-12-21T16:59:40.000-05:00',
u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}',
u'system': u'user-info'}},
{u'_id': u'THIS IS THE SECOND ONE',
u'_index': u'two',
u'_score': None,
u'_source': {u'@timestamp': u'2015-12-12 T16:59:40.000-05:00',
u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":565656} {"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',
u'system': u'user-info'}},
]}}
unpacking= ThreeLine['hits']['hits'] #we only want to talk to the sort dictionary.
for d in unpacking:
newinfo= []
narrow=[d["_source"] for d in unpacking if "_source" in d]
narrower=[d["message"] for d in narrow if "message" in d]
newinfo.append(narrower)
print newinfo
现在,使用代码原样,它会打印两个条目,但它有很多随意的垃圾,我不关心,就像所有标签一样:
{"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',
那么,我如何进一步删除这些条目,以便我最终想要摆脱这个疯狂嵌套混乱的两条线?如果有人对如何清理当前代码有任何想法,我会全力以赴并准备好学习!
答案 0 :(得分:0)
'tags'字典不是字典。它是 text 嵌入在消息字符串中:
>>> ThreeLine['hits']['hits'][0]['_source']['message']
u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}'
你必须做一些字符串解析来删除它。您可以使用正则表达式:
import re
id_and_link = re.compile(r'(\[[^]]+\]) Information Link: (https?://[\w\d/.]+)')
messages = (entry['_source']['message'] for entry in ThreeLine['hits']['hits'] if '_source' in entry and 'message' in entry['_source'])
for message in messages:
match = id_and_link.search(message)
if not match:
continue
id_, link = match.groups()
print 'Message:', id_
print 'Link:', link
print