我有很多想要可视化的JSON请求。 JSON请求保存在.blob文件中。问题在于JSON请求是深层嵌套的。我无法找出有效的代码段将所有数据写入数据框。
这是我当前的代码,它可以工作,但是效率不高。
path_to_blob = '/mnt/data/'
read_files = glob.iglob(os.path.join(path_to_blob, "**/*.blob"), recursive=True)
np_array_values = []
for files in read_files:
data = [json.loads(line) for line in open(files, encoding="utf8")]
all_data = json_normalize(data)
request_data = json_normalize(data, record_path=['request'])
dataframes = [request_data, all_data]
dataset = pd.concat(dataframes, axis=1)
np_array_values.append(dataset)
dataframe = pd.concat(np_array_values)
这是请求之一:
{"request":[{"id":"12345678","name":"GET navigation/Index","count":123,"responseCode":123,"success":true,"url":"http://server1.test.com/12345678","urlData":{"base":"/navigation/123456","host":"server1.test.com","hashTag":"","protocol":"http"},"durationMetric":{"value":12345.0,"count":123.0,"min":12345.0,"max":12345.0,"stdDev":0.0,"sampledValue":12345.0}}],"internal":{"data":{"id":"12345678","documentVersion":"123.0"}},"context":{"data":{"eventTime":"2020-5-5","isSynthetic":false,"samplingRate":123.0},"cloud":{},"device":{"type":"PC","roleName":"ROLENAME","roleInstance":"SERVERNAME","screenResolution":{}},"session":{"isFirst":false},"operation":{"id":"12345678=","parentId":"12345678=","name":"GET navigation/url"},"location":{"clientip":"0.0.0.0","continent":"Europe","country":"Netherlands"},"custom":{"dimensions":[{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'123.0')"},{"InstanceKey":"12345678"}]}}}
我最近阅读了有关dask的内容,使用dask似乎是明智的做法,因为数据集为1.2TB。有人可以告诉我如何在DataFrame中获取此嵌套的JSON请求吗?
谢谢!
答案 0 :(得分:0)
Python的自由只是因为这个问题而被忽略。您要搜索的是
json.loads()
此代码:
import json
from pprint import pprint
with open("test.json", "r") as rf:
jx = rf.read()
jx = json.loads(jx)
pprint(jx)
使您返回字典:
{'context': {'cloud': {},
'custom': {'dimensions': [{'_MS.ProcessedByMetricExtractors': "(Name:'Requests', "
"Ver:'123.0')"},
{'InstanceKey': '12345678'}]},
'data': {'eventTime': '2020-5-5',
'isSynthetic': False,
'samplingRate': 123.0},
'device': {'roleInstance': 'SERVERNAME',
'roleName': 'ROLENAME',
'screenResolution': {},
'type': 'PC'},
'location': {'clientip': '0.0.0.0',
'continent': 'Europe',
'country': 'Netherlands'},
'operation': {'id': '12345678=',
'name': 'GET navigation/url',
'parentId': '12345678='},
'session': {'isFirst': False}},
'internal': {'data': {'documentVersion': '123.0', 'id': '12345678'}},
'request': [{'count': 123,
'durationMetric': {'count': 123.0,
'max': 12345.0,
'min': 12345.0,
'sampledValue': 12345.0,
'stdDev': 0.0,
'value': 12345.0},
'id': '12345678',
'name': 'GET navigation/Index',
'responseCode': 123,
'success': True,
'url': 'http://server1.test.com/12345678',
'urlData': {'base': '/navigation/123456',
'hashTag': '',
'host': 'server1.test.com',
'protocol': 'http'}}]}