这是我当前拥有的JSON数据,我需要在Pandas数据框中使用该数据以符合需要。
{
"lab1": {
"co2": [
9.559335530495726
],
"occupancy": [
4
],
"temperature": [
21.033629524242304
],
"time": "2020-09-15T16:15:35.565629"
}
}
{
"class1": {
"co2": [
24.168445969175817
],
"occupancy": [
15
],
"temperature": [
26.176607611778156
],
"time": "2020-09-15T16:15:36.027525"
}
}
{
"office": {
"co2": [
6.633787232630541
],
"occupancy": [
1
],
"temperature": [
27.727982558797844
],
"time": "2020-09-15T16:15:36.608386"
}
}
我尝试了json_normalize
,但是我不明白如何规范化JSON数据。
with open('data.json','r') as f:
data = json.loads(f.read())
# Normalizing data
data1 = pd.json_normalize(data, record_path =['Results'])
# Saving to CSV format
multiple_level_data.to_csv('multiplelevel_normalized_data.csv', index=False)
我使用此代码,出现以下错误
JSONDecodeError跟踪(最近一次通话) 在1与 open('data.json','r')as f:----> 2 data = json.loads(f.read()) JSONDecodeError:额外数据:第14行第2列(字符240)
答案 0 :(得分:0)
您可以使用熊猫read_json。
首先使用正则表达式从数据中删除所有的'['和']'。然后将其转换为json文件。
import pandas as pd
pd.read_json (r'Path where you saved the JSON file/filename.json')
答案 1 :(得分:0)
这里是没有正则表达式的方法。
import pandas as pd
data = [
{'lab1': {'co2': [9.559335530495726],
'occupancy': [4],
'temperature': [21.033629524242304],
'time': '2020-09-15T16:15:35.565629'}},
{'class1': {'co2': [24.168445969175817],
'occupancy': [15],
'temperature': [26.176607611778156],
'time': '2020-09-15T16:15:36.027525'}},
{'office': {'co2': [6.633787232630541],
'occupancy': [1],
'temperature': [27.727982558797844],
'time': '2020-09-15T16:15:36.608386'}}
]
现在遍历字典列表。使用explode()
展平列表。
df = list()
for d in data:
for key, values in d.items():
t = (pd.json_normalize(values)
.explode('co2')
.explode('occupancy')
.explode('temperature')
.assign(location=key)
)
df.append(t)
df = pd.concat(df)
print(df)
co2 occupancy temperature time location
0 9.55934 4 21.0336 2020-09-15T16:15:35.565629 lab1
0 24.1684 15 26.1766 2020-09-15T16:15:36.027525 class1
0 6.63379 1 27.728 2020-09-15T16:15:36.608386 office
最初的问题没有预期的结果,但是此数据框将支持许多类型的进一步分析。
答案 2 :(得分:0)
@Fareed Ahmad发布了一个更大的数据集。
首先,我们创建两个函数:1)将Gist文件转换为数据包序列;和2)将数据包转换为数据帧:
import json
import pandas as pd
import requests
def resp_to_packets(resp_text):
''' Convert file to list of packets.'''
packet = ''
for line in resp_text.split('\n'):
if line.startswith('}{'):
packet += '}'
yield json.loads(packet)
packet = '{'
else:
packet += line + '\n'
yield json.loads(packet)
def packet_to_df(packet):
''' Convert packet to data frame.'''
df = list()
for key, values in packet.items():
t = (pd.json_normalize(values)
.explode('co2')
.explode('occupancy')
.explode('temperature')
.assign(location=key)
)
df.append(t)
return pd.concat(df, ignore_index=True)
dtypes = {'co2': float, 'occupancy': int, 'temperature': float,
'time': 'datetime64', 'location': str}
第二,运行管道,包括连接数据帧和转换类型:
url = 'https://gist.githubusercontent.com/Fareed99/45b8a39a7e4493243ec973fa73f2b92b/raw/4b2eb95c24c95d40733f06d13dbf0356c4520e99/data.json'
r = requests.get(url)
assert r.ok
packets = (packet for packet in resp_to_packets(r.text))
dfs = (packet_to_df(packet) for packet in packets)
df = pd.concat(dfs, ignore_index=True).astype(dtype=dtypes)
print(df.tail())
co2 occupancy temperature time location
476 45.237285 15 27.364173 2020-09-15 20:37:29.252201 class1
477 5.565177 4 21.033565 2020-09-15 20:37:29.667347 lab1
478 10.799228 1 21.014435 2020-09-15 20:37:30.689885 lab1
479 36.989700 20 27.059197 2020-09-15 20:37:33.467733 class1
480 1.836340 2 23.021893 2020-09-15 20:37:35.853943 office