在python中读取平面文件时折叠多行

时间:2017-11-13 12:54:29

标签: python parsing flat-file

我想在python中解析一个看起来像这样的平面文件;

  Element ID     Element Type     Result       Jacobian Sign    

============== ================= ========= =====================
      1            Parabolic      Warning          1.000000     
                  Hexahedron                                    
      2            Parabolic      Warning          1.000000     
                  Hexahedron                                    
      3            Parabolic      Warning          1.000000     
                  Hexahedron                                    
      4            Parabolic      Warning          1.000000     

我尝试使用this answer中使用的机制如下;

import pandas as pd

def parse_file(file):
    col_spec = [(0, 15), (16, 33), (34, 43), (44, 65)]
    return pd.read_fwf(file, colspecs=col_spec)

但是,除了单词' Hexahedron'之外,它读取了一行的顶行和一行是空的。作为元素类型。

>>> data = parse_file("example.txt")
>>> data.head()
       Element ID      Element Type    Result         Jacobian Sign
0             NaN               NaN       NaN                   NaN
1  ==============  ================  ========  ====================
2               1         Parabolic   Warning              1.000000
3             NaN        Hexahedron       NaN                   NaN <= Extra record
4               2         Parabolic   Warning              1.000000

从行中可以看出,前两行被捕获为2条记录(记录2和3)。我希望解析器将前两行捕获为一条记录,这样短语“抛物线六面体”就可以了。被捕获为元素类型。我怎么能这样做?

1 个答案:

答案 0 :(得分:1)

一些后处理应该可以解决问题。以下是使用shift运算符的一些代码。另请注意,不需要打开文件,只需将文件名传递给pd.read_fwf

import pandas as pd

col_spec = [(0, 15), (15, 32), (32, 42), (43, 65)]
df = pd.read_fwf("example.txt", colspecs=col_spec, comment="=")

# combine rows
df["combined"] = (df['Element Type'] + df['Element Type'].shift(-1)).where(df['Element ID'].notnull(), df['Element Type'] )
# remove extra rows
df = df[df['Element ID'].notnull()]

这应该给出一个如下所示的DataFrame:

  Element ID Element Type   Result Jacobian Sign             combined
2          1    Parabolic  Warning      1.000000  ParabolicHexahedron
4          2    Parabolic  Warning      1.000000  ParabolicHexahedron
6          3    Parabolic  Warning      1.000000  ParabolicHexahedron
8          4    Parabolic  Warning      1.000000  ParabolicHexahedron