Reshape subsections of Pandas DataFrame into a wide format

时间:2018-08-22 14:00:06

标签: python pandas

I am importing data from a PDF which has not been optimised for analysis.

The data has been imported into the following dataframe

NaN   NaN   Plant_A     NaN     Plant_B      NaN   
Pre   1,2   1.1         1.2         6.1      6.2   
Pre   3,4   1.3         1.4         6.3      6.4
Post  1,2   2.1         2.2         7.1      7.2
Post  3,4   2.3         2.4         7.3      7.4

and I would like to reorganise it into the following form:

            Pre_1   Pre_2   Pre_3   Pre_4  Post_1   Post_2   Post_3   Post_4  
Plant_A       1.1     1.2     1.3     1.4     2.1      2.2      2.3      2.4
Plant_B       6.1     6.2     6.3     6.4     7.1      7.2      7.3      7.4

I started by splitting the 2nd column by commas, and then combining that with the first column to give me Pre_1 and Pre_2 for instance. However I have struggled to match that with the data in the rest of the columns. For instance, Pre_1 with 1.1 and Pre_2 with 1.2

Any help would be greatly appreciated.

1 个答案:

答案 0 :(得分:3)

关于您的数据的一致性,我不得不做一些假设

from itertools import cycle
import pandas as pd

tracker = {}

for temporal, spec, *data in df.itertuples(index=False):
  data = data[::-1]
  cycle_plant = cycle(['Plant_A', 'Plant_B'])
  spec_i = spec.split(',')

  while data:
    plant = next(cycle_plant)
    for i in spec_i:
      tracker[(plant, f"{temporal}_{i}")] = data.pop()

pd.Series(tracker).unstack()

         Post_1  Post_2  Post_3  Post_4  Pre_1  Pre_2  Pre_3  Pre_4
Plant_A     2.1     2.2     2.3     2.4    1.1    1.2    1.3    1.4
Plant_B     7.1     7.2     7.3     7.4    6.1    6.2    6.3    6.4