我有一个包含以下列的数据框:
serial_no,timestamp,parameter1,parameter2,parameter3,...
此数据框可以有多个serial_no。所以我需要在json中使用以下格式:
[
{
'serial_no':'a001',
'readings':[
{
'name':'parameter1',
'datapoints':[
('2018-01-01 00:00:00',5),('2018-01-01 00:01:00',35),..
]
},{'name':'parameter2',..},..
]
},{'serial_no':'a002',..},..
]
样本表数据:
|-----------|------------------------------------------|---------------------------|
| serial_no | timestamp | parameter1 | parameter2 | parameter3 |
|-----------|------------------------------------------|---------------------------|
| a001 | '2018-01-01 00:00:00' | 5 | 4 | 3 |
|-----------|------------------------------------------|---------------------------|
| a001 | '2018-01-01 00:01:00' | 35 | 7 | 13 |
|-----------|------------------------------------------|---------------------------|
| a002 | '2018-01-01 00:01:03' | 2 | 6 | 11 |
|-----------|------------------------------------------|---------------------------|
| a002 | '2018-01-02 05:00:00' | 5 | 16 | 98 |
|-----------|------------------------------------------|---------------------------|
| a003 | '2018-01-02 05:32:01' | 0 | 1.4 | 3 |
|-----------|------------------------------------------|---------------------------|
我该怎么做?
答案 0 :(得分:2)
我不知道pandas
中是否有直接的方式,但您可以创建一个函数来编写您的特定格式,然后使用groupby
和apply
,例如:
def create_specific_format (df_grouped):
dict_output = {'serial_no': df_grouped['serial_no'].iloc[0]}
dict_output['readings'] = []
for col in ['parameter1','parameter2','parameter3']:
dict_output['readings'].append({'name':col,
'datapoints': df_grouped.apply(lambda row: (row['timestamp'], row[col]),1).tolist()})
return dict_output
你想要的东西可以通过以下方式获得:
df.groupby('serial_no', as_index=False).apply(create_specific_format).tolist()
答案 1 :(得分:0)
单独使用Pandas方法没有直接的方法。但这是一个非常干净的方法:
columns = ['serial_no','timestamp','parameter1','parameter2','parameter3']
values = [['a001','2018-01-01 00:00:00',5,14,3],
['a001','2018-01-01 00:01:00',35,7,13],
['a002','2018-01-01 00:01:03',2,6,11],
['a002','2018-01-02 05:00:00',5,16,98],
['a003','2018-01-02 05:32:01',0,1.4,3]]
df = pd.DataFrame(values, columns=columns)
p_fields = ['parameter1', 'parameter2', 'parameter3']
serials = []
for sn, data in df.groupby('serial_no'):
serial = {}
serial['serial_no'] = sn
serial['readings'] = []
# Associate timestamps with parameter data
params = {p: zip(data.timestamp, data[p]) for p in p_fields}
readings = [{'name': p, 'datapoints': params[p]} for p in params]
serial['readings'] = readings
serials.append(serial)
serials[0]
{'readings': [{'datapoints': [('2018-01-01 00:00:00', 5),
('2018-01-01 00:01:00', 35)],
'name': 'parameter1'},
{'datapoints': [('2018-01-01 00:00:00', 3), ('2018-01-01 00:01:00', 13)],
'name': 'parameter3'},
{'datapoints': [('2018-01-01 00:00:00', 14.0), ('2018-01-01 00:01:00', 7.0)],
'name': 'parameter2'}],
'serial_no': 'a001'}