如何解包和拆包此熊猫数据框

时间:2019-02-14 14:25:36

标签: pandas

我有一个看起来像这样的数据框:

+---+-----------+-------------------+--------------+----------------------------------+-------------------+
|   | Parameter | Related_Parameter | StoredLength |            StoredTime            |    StoredValue    |
+---+-----------+-------------------+--------------+----------------------------------+-------------------+
| 0 |       501 |              1005 | 101#102#103  | 2019-01-01#2019-01-02#2019-01-03 | 213.1#214.7#216.7 |
| 1 |       501 |              1005 | 101#102#103  | 2019-01-04#2019-01-05#2019-01-06 | 13123#12313#31232 |
| 2 |       501 |              1005 | 101#102#103  | 2019-01-07#2019-01-08#2019-01-09 | 0#0#0#0           |
| 3 |       502 |              1006 | 110#111#112  | 2019-01-01#2019-01-02#2019-01-03 | 123.1#123.2#123.5 |
+---+-----------+-------------------+--------------+----------------------------------+-------------------+

StoredLength,StoredTime和StoredValue列具有由#号分隔的打包值。我想解压缩(拆分)它们并渲染如下所示的数据框:

+------------+------------+-----------+------------+-----------+-------------------+
|    Time    | 501_Length | 501_Value | 502_Length | 502_Value | Related_Parameter |
+------------+------------+-----------+------------+-----------+-------------------+
| 2019-01-01 |        101 |     213.1 |        110 |     123.1 |              1005 |
| 2019-01-02 |        102 |     214.7 |        111 |     123.2 |              1005 |
| 2019-01-03 |        103 |     216.7 |        112 |     123.5 |              1006 |
+------------+------------+-----------+------------+-----------+-------------------+

也就是说,对于我要的每一行:

  • 拆分StoredTime,StoredLength和StoredValue
  • 使用StoredTime列表中的元素作为行创建Time列
  • 取参数('x')的值并创建一列x_Value和x_Length
  • 将StoredLength和StoredValue列表的元素添加为这些列中的行
  • 填写相关参数

以下是生成测试帧的代码:

df1 = pd.DataFrame({
   'Parameter': [501, 501, 501, 502],
   'Related_Parameter': [1005, 1005, 1005, 1006],
   'StoredTime': [
       '2019-01-01#2019-01-02#2019-01-03',
       '2019-01-04#2019-01-05#2019-01-06',
       '2019-01-07#2019-01-08#2019-01-09',
       '2019-01-01#2019-01-02#2019-01-03'
   ],
   'StoredValue': [
       '213.1#214.7#216.7',
       '13123#12313#31232',
       '0#0#0#0',
       '123.1#123.2#123.5'
   ],
   'StoredLength': ['101#102#103']*3+['110#111#112']
})

df2 = pd.DataFrame({
    'Time': [
        '2019-01-01',
        '2019-01-02',
        '2019-01-03'
    ],
    '501_Length': [101, 102, 103],
    '501_Value': [213.1, 214.7, 216.7],
    '502_Length': [110,111,112],
    '502_Value': [123.1,123.2,123.5],
    'Related_Parameter': [1005,1005,1006],
}).set_index('Time')

0 个答案:

没有答案