动态分割资料框

时间:2019-07-16 13:27:13

标签: python pandas split

我有300条带有df的df,这些行并不总是均匀分布的。他们看起来像这样:

Lags     Rep 1      Rep 2     Rep 3 
12.500000000E-9     7671.039418     6605.763724     10144.873125
25.000000000E-9     -1.000000   -0.479659   1.454251
37.500000000E-9     31.978402   23.456005   29.678136
50.000000000E-9     5.315013    4.723746    0.227125
62.500000000E-9     1.806673    2.642384    2.681376
75.000000000E-9     NaN     NaN     NaN
83.500000000E-9     NaN     NaN     NaN

Time    PhtA count 1     PhtA count 2     PhtA count 3
0.000000000E+0  42.743683   10.890961   12.454987
2.428800000E-3  14.533997   8.125305    7.534027
4.857600000E-3  8.621216    7.686615    7.133484
7.286400000E-3  5.779266    10.147095   6.561279
9.715200000E-3  6.046295    8.201599    5.187988
12.144000000E-3     5.226135    7.343292    5.855560

Time    PhtB count 1     PhtB count 2     PhtB count 3
0.860800000E-3  12.626648   13.580322   8.220673
1.289600000E-3  10.814667   21.381378   7.038116
2.718400000E-3  7.915497    17.261505   7.648468
3.147200000E-3  9.403229    21.266937   10.013580

分割时,最好有3个这样的df:

第一个df:

Lags     Rep 1      Rep 2     Rep 3 
12.500000000E-9     7671.039418     6605.763724     10144.873125
25.000000000E-9     -1.000000   -0.479659   1.454251
37.500000000E-9     31.978402   23.456005   29.678136
50.000000000E-9     5.315013    4.723746    0.227125
62.500000000E-9     1.806673    2.642384    2.681376

第二个df:

Time    PhtA count 1     PhtA count 2     PhtA count 3
0.000000000E+0  42.743683   10.890961   12.454987
2.428800000E-3  14.533997   8.125305    7.534027
4.857600000E-3  8.621216    7.686615    7.133484
7.286400000E-3  5.779266    10.147095   6.561279
9.715200000E-3  6.046295    8.201599    5.187988
12.144000000E-3     5.226135    7.343292    5.855560

第三df

Time    PhtB count 1     PhtB count 2     PhtB count 3
0.860800000E-3  12.626648   13.580322   8.220673
1.289600000E-3  10.814667   21.381378   7.038116
2.718400000E-3  7.915497    17.261505   7.648468
3.147200000E-3  9.403229    21.266937   10.013580

三个块的长度并不总是相同,这就是为什么我要寻求帮助以编程方式解决此问题的原因。我可以说的第一个df的一些细节是:

  • 第一个块始终以一串值NaN(在示例中只有两个)的行结尾

  • 还有另外两个以命名列标题开头的块(时间,PhtA计数1,PhtA计数2,...)

  • 最后两个块没有任何NaN值

  • 所有块的行数都是可变的,尽管标题始终相同

  • 总是有一个空行分隔各个块

任何帮助将不胜感激。

谢谢。

1 个答案:

答案 0 :(得分:2)

首先将所有数据读入保留空行的df中,然后在这些空行处将其拆分并转换为数字:

df = pd.read_csv('data.csv', sep='\s{2,}', skip_blank_lines=False, engine='python')
x = df[df.Lags.isnull()==True].index.values

df1 = df[0:x[0]].dropna().apply(pd.to_numeric)

df2 = df[x[0]+2:x[1]].apply(pd.to_numeric)
df2.columns=df.iloc[x[0]+1].values

df3 = df[x[1]+2:].apply(pd.to_numeric)
df3.columns = df.iloc[x[1]+1].values

print(df1);print(df2); print(df3)的输出:

           Lags        Rep 1        Rep 2         Rep 3
0  1.250000e-08  7671.039418  6605.763724  10144.873125
1  2.500000e-08    -1.000000    -0.479659      1.454251
2  3.750000e-08    31.978402    23.456005     29.678136
3  5.000000e-08     5.315013     4.723746      0.227125
4  6.250000e-08     1.806673     2.642384      2.681376
        Time  PhtA count 1  PhtA count 2  PhtA count 3
9   0.000000     42.743683     10.890961     12.454987
10  0.002429     14.533997      8.125305      7.534027
11  0.004858      8.621216      7.686615      7.133484
12  0.007286      5.779266     10.147095      6.561279
13  0.009715      6.046295      8.201599      5.187988
14  0.012144      5.226135      7.343292      5.855560
        Time  PhtB count 1  PhtB count 2  PhtB count 3
17  0.000861     12.626648     13.580322      8.220673
18  0.001290     10.814667     21.381378      7.038116
19  0.002718      7.915497     17.261505      7.648468
20  0.003147      9.403229     21.266937     10.013580


奖金:csv中任意数量的数据块的通用解决方案,以空行表示(无需事先知道其数量):

df = pd.read_csv('data.csv', sep='\s{2,}', skip_blank_lines=False, engine='python', header=None)
x = [-1] + list(df[df.iloc[:,0].isnull()==True].index.values) + [len(df)]
for i in range(1,len(x)):
     globals()[f'df{i}'] = df[x[i-1]+2:x[i]].dropna().apply(pd.to_numeric)
     globals()[f'df{i}'].columns = df.iloc[x[i-1]+1].values