我有一个包含Windows 10日志的熊猫数据框。我希望将此pandas df转换为JSON。什么是有效的方法?
我已经使其生成默认的熊猫df,但这不是嵌套的。我要什么
{
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"1": {
"ProcessName": "Excel",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "Word",
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0,
"internal_time": 1.5533333333,
"counter": 0
}
我希望它看起来像这样
{
"0": {
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes" : {
"Firefox" : 0 # ("counter" value),
"Excel" : 0
},
"1": ...
}
答案 0 :(得分:2)
在我看来,您想从基于['time', 'timeFloat', 'internal_time']
的聚合数据中创建JSON,您可以这样做:
pd.groupby(['time', 'timeFloat', 'internal_time'])
但是,您的示例建议您要维护与先前陈述的意图相反的索引键("0", "1"
等)。
某个时间点的合计值:
"Firefox" : 0
"Excel" : 0
似乎与这些索引键相对应,当您进行聚合时这些索引键将会丢失。
但是,如果您决定使用聚合,则代码将如下所示:
# reading in data:
import pandas as pd
import json
json_data = {
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"1": {
"ProcessName": "Excel",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "Word",
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0,
"internal_time": 1.5533333333,
"counter": 0
}}
df = pd.DataFrame.from_dict(json_data)
df = df.T
df.set_index(["ProcessName", 'time', 'timeFloat', 'internal_time', 'counter'])
# processing:
ddf = df.groupby(['time', 'timeFloat', 'internal_time'], as_index=False).agg(lambda x: list(x))
ddf['Processes'] = ddf.apply(lambda r: dict(zip(r['ProcessName'], r['counter'])), axis=1)
ddf = ddf.drop(['ProcessName', 'counter'], axis=1).
# printing the result:
json2 = json.loads(ddf.to_json(orient="records"))
print(json.dumps(json2, indent=4, sort_keys=True))
结果:
[
{
"Processes": {
"Excel": 0,
"Firefox": 0
},
"internal_time": 0.0,
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0
},
{
"Processes": {
"Word": 0
},
"internal_time": 1.5533333333,
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0
}
]
答案 1 :(得分:1)
据我了解,您需要按“时间”分组对象,并合并来自不同进程的计数器。如果是-这是实现示例:
input_data = {
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "ZXC",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"3": {
"ProcessName": "QWE",
"time": "else_time",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
}
}
def group_input_data_by_time(dict_data):
time_data = {}
for value_dict in dict_data.values():
counter = value_dict["counter"]
process_name = value_dict["ProcessName"]
time_ = value_dict["time"]
common_data = {
"time": time_,
"timeFloat": value_dict["timeFloat"],
"internal_time": value_dict["internal_time"],
}
common_data = time_data.setdefault(time_, common_data)
processes = common_data.setdefault("Processes", {})
processes[process_name] = counter
# if required to change keys from time to enumerated
result_dict = {}
for ind, value in enumerate(time_data.values()):
result_dict[str(ind)] = value
return result_dict
print(group_input_data_by_time(input_data))
结果是:
{
"0": {
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes": {
"Firefox": 0,
"ZXC": 0
}
},
"1": {
"time": "else_time",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes": {
"QWE": 0
}
}
}