熊猫和大型数据帧

时间:2016-05-09 00:29:41

标签: python pandas

我决定使用pandas(0.18.1)来处理来自我的一个模型的使用离散元素粒子的日志数据。此日志具有与400000个粒子(x,y,z位置和速度;大约5M行)相关的属性,具有以下结构:

*****************************************
* Log File Started 16:12:54 Fri May 06 2016
* 4.00-182 (64-bit)
* 
*  
*     
*****************************************
elrond>
Ball_Id 400000
Ballx 4.90707890560e+002
Bally 9.19154644947e+001
Ballz -1.02229145082e+002
Top 0
Dx 1.38904597749e+000
Dy -6.35282219552e-001
Dz -1.64199872399e+001
Velx -1.02171891554e-001
Vely -1.05325799073e-002
Velz 4.04701964190e-003
V_rotx -6.86579713474e-004
V_roty 9.14539972137e-004
V_rotz -7.76239471255e-005
Ball_Id 399999
Ballx 7.48469370428e+002
Bally 2.46351257548e+001
Ballz -8.62490399310e+001
Top 0
Dx 6.96274451933e-001
Dy 1.32036797483e+000
Dz -1.87517847236e+001
Velx -1.05970416552e-002
Vely 7.21491947832e-003
Velz 7.55093644847e-004
V_rotx 5.17377621567e-006
V_roty 2.59041151397e-005
V_rotz -2.31863427848e-005
Ball_Id 399998
Ballx 1.19395239848e+002
Bally 7.80444921824e+001
Ballz 2.34352803814e+000
Top 0
Dx 5.90917177795e+001
Dy 1.37004693793e+000
Dz 1.61822040639e+001
Velx 1.31243808962e+001
Vely -8.20542806383e-001
Velz 6.19737823128e+000
V_rotx -4.89777825136e-002
V_roty 9.36324827264e-002
V_rotz -5.90727285357e-002

我想获得这种格式的文件:

Ball_Id Ballx   Bally   Ballz   Topo    Dx  Dy  Dz  Velx    Vely    Velz    V_rotx  V_roty  V_rotz
400000  4.90714073236e+002  9.19065373175e+001  -1.02231392317e+002 0   1.39522865407e+000  -6.44209396797e-001 -1.64222344741e+001 2.68881171417e-002  -1.81227520077e-002 -4.04738585013e-003 7.75669240314e-005  -4.00875407555e-004 -1.41810083383e-004
399999  7.48472521138e+002  2.46451444724e+001  -8.62470162686e+001 0   6.99425161310e-001  1.33038669240e+000  -1.87497610612e+001 1.18932839949e-002  4.69256261481e-003  1.38621378252e-002  -6.30154171502e-006 -3.23043526114e-004 2.16368702869e-007
399998  1.28116171848e+002  7.67039376593e+001  7.55623907648e+000  0   6.78126497794e+001  2.94924148016e-002  2.13949151023e+001  6.33940244884e+000  1.73376959946e-001  4.85967665797e+000  -3.52816583310e-001 -5.38872247688e-001 1.12736371677e-001
399996  4.79841096924e+002  -1.62882386399e+002 -1.30791611129e+002 Topo1 2.73837679243e+000    -1.47077675894e+000 -6.28235946603e+000 7.90493795999e-002  -3.39089755154e-002 1.02726075741e-003  -1.14738159279e-004 -7.24753898272e-005 -6.78627383629e-005

到目前为止,我能够编写一个非常低效的代码,需要永久才能获得我想要的最终文件。任何提高它的建议都会很棒。 感谢

import pandas as pd
#=================================================================================
df = pd.read_csv("Desloc_Caixa_Compress_14_04_16_19.log",index_col=0,header = None, skiprows =[0,1,2,3,4,5,6,7],engine='python',skipfooter = 4, sep=" ")
dados = df[0:14] 
#=================================================================================
k=14; f=28; m=28; n=42
while (n<=len(df)):
    a=df[k:f]
    b=df[m:n]
    k+=28; f+=28
    m+=28; n+=28
    dados = pd.concat([dados,a, b], axis=1)
#=================================================================================    
d= dados.transpose()
data = d.set_index('Ball_Id')
data.to_csv('Data_14_04_16_19.txt', sep='\t')
#=================================================================================

1 个答案:

答案 0 :(得分:1)

您可以使用df.pivot

import pandas as pd
df = pd.read_csv("Desloc_Caixa_Compress_14_04_16_19.log", header=None, 
                 skiprows=8, engine='python', skipfooter=4, sep=" ")

df['index'] = (df[0] == 'Ball_Id').cumsum()
df = df.pivot(index='index', columns=0, values=1)

产量

0       Ball_Id       Ballx      Bally       Ballz         Dx        Dy   
index                                                                     
1      400000.0  490.707891  91.915464 -102.229145   1.389046 -0.635282   
2      399999.0  748.469370  24.635126  -86.249040   0.696274  1.320368   
3      399998.0  119.395240  78.044492    2.343528  59.091718  1.370047   

                                                                          \
0             Dz  Top    V_rotx    V_roty    V_rotz       Velx      Vely   
index                                                                      
1     -16.419987  0.0 -0.000687  0.000915 -0.000078  -0.102172 -0.010533   
2     -18.751785  0.0  0.000005  0.000026 -0.000023  -0.010597  0.007215   
3      16.182204  0.0       NaN       NaN       NaN  13.124381 -0.820543   


0          Velz  
index            
1      0.004047  
2      0.000755  
3           NaN