I am importing data from a PDF which has not been optimised for analysis.
The data has been imported into the following dataframe
NaN NaN Plant_A NaN Plant_B NaN
Pre 1,2 1.1 1.2 6.1 6.2
Pre 3,4 1.3 1.4 6.3 6.4
Post 1,2 2.1 2.2 7.1 7.2
Post 3,4 2.3 2.4 7.3 7.4
and I would like to reorganise it into the following form:
Pre_1 Pre_2 Pre_3 Pre_4 Post_1 Post_2 Post_3 Post_4
Plant_A 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
Plant_B 6.1 6.2 6.3 6.4 7.1 7.2 7.3 7.4
I started by splitting the 2nd column by commas, and then combining that with the first column to give me Pre_1
and Pre_2
for instance. However I have struggled to match that with the data in the rest of the columns. For instance, Pre_1
with 1.1
and Pre_2
with 1.2
Any help would be greatly appreciated.
答案 0 :(得分:3)
关于您的数据的一致性,我不得不做一些假设
from itertools import cycle
import pandas as pd
tracker = {}
for temporal, spec, *data in df.itertuples(index=False):
data = data[::-1]
cycle_plant = cycle(['Plant_A', 'Plant_B'])
spec_i = spec.split(',')
while data:
plant = next(cycle_plant)
for i in spec_i:
tracker[(plant, f"{temporal}_{i}")] = data.pop()
pd.Series(tracker).unstack()
Post_1 Post_2 Post_3 Post_4 Pre_1 Pre_2 Pre_3 Pre_4
Plant_A 2.1 2.2 2.3 2.4 1.1 1.2 1.3 1.4
Plant_B 7.1 7.2 7.3 7.4 6.1 6.2 6.3 6.4