Pandas Dataframe到特定格式的字典列表

时间:2018-06-13 12:22:01

标签: python python-2.7 pandas dataframe

我有一个包含以下列的数据框:

serial_no,timestamp,parameter1,parameter2,parameter3,...

此数据框可以有多个serial_no。所以我需要在json中使用以下格式:

[
 {
   'serial_no':'a001',
    'readings':[
      {
       'name':'parameter1',
       'datapoints':[
          ('2018-01-01 00:00:00',5),('2018-01-01 00:01:00',35),..
        ]
      },{'name':'parameter2',..},..
     ]
 },{'serial_no':'a002',..},..
]

样本表数据:

|-----------|------------------------------------------|---------------------------|
| serial_no |         timestamp          | parameter1  | parameter2  | parameter3  |
|-----------|------------------------------------------|---------------------------|
|   a001    |   '2018-01-01 00:00:00'    |     5       |     4       |     3       |
|-----------|------------------------------------------|---------------------------|
|   a001    |   '2018-01-01 00:01:00'    |     35      |     7       |     13      |
|-----------|------------------------------------------|---------------------------|
|   a002    |   '2018-01-01 00:01:03'    |     2       |     6       |     11      |
|-----------|------------------------------------------|---------------------------|
|   a002    |   '2018-01-02 05:00:00'    |     5       |     16      |     98      |
|-----------|------------------------------------------|---------------------------|
|   a003    |   '2018-01-02 05:32:01'    |     0       |     1.4     |     3       |
|-----------|------------------------------------------|---------------------------|

我该怎么做?

2 个答案:

答案 0 :(得分:2)

我不知道pandas中是否有直接的方式,但您可以创建一个函数来编写您的特定格式,然后使用groupbyapply,例如:

def create_specific_format (df_grouped):
    dict_output = {'serial_no': df_grouped['serial_no'].iloc[0]}
    dict_output['readings'] = []
    for col in ['parameter1','parameter2','parameter3']:
        dict_output['readings'].append({'name':col,
                                        'datapoints': df_grouped.apply(lambda row: (row['timestamp'], row[col]),1).tolist()})
    return dict_output

你想要的东西可以通过以下方式获得:

df.groupby('serial_no', as_index=False).apply(create_specific_format).tolist()

答案 1 :(得分:0)

单独使用Pandas方法没有直接的方法。但这是一个非常干净的方法:

columns = ['serial_no','timestamp','parameter1','parameter2','parameter3']
values = [['a001','2018-01-01 00:00:00',5,14,3],
        ['a001','2018-01-01 00:01:00',35,7,13],
        ['a002','2018-01-01 00:01:03',2,6,11],
        ['a002','2018-01-02 05:00:00',5,16,98],
        ['a003','2018-01-02 05:32:01',0,1.4,3]]

df = pd.DataFrame(values, columns=columns)

p_fields = ['parameter1', 'parameter2', 'parameter3']
serials = []

for sn, data in df.groupby('serial_no'):

    serial = {}
    serial['serial_no'] = sn
    serial['readings'] = []

    # Associate timestamps with parameter data
    params = {p: zip(data.timestamp, data[p]) for p in p_fields}
    readings = [{'name': p, 'datapoints': params[p]} for p in params]

    serial['readings'] = readings
    serials.append(serial)

serials[0]

{'readings': [{'datapoints': [('2018-01-01 00:00:00', 5),
    ('2018-01-01 00:01:00', 35)],
   'name': 'parameter1'},
  {'datapoints': [('2018-01-01 00:00:00', 3), ('2018-01-01 00:01:00', 13)],
   'name': 'parameter3'},
  {'datapoints': [('2018-01-01 00:00:00', 14.0), ('2018-01-01 00:01:00', 7.0)],
   'name': 'parameter2'}],
 'serial_no': 'a001'}