我想在python中解析一个看起来像这样的平面文件;
Element ID Element Type Result Jacobian Sign
============== ================= ========= =====================
1 Parabolic Warning 1.000000
Hexahedron
2 Parabolic Warning 1.000000
Hexahedron
3 Parabolic Warning 1.000000
Hexahedron
4 Parabolic Warning 1.000000
我尝试使用this answer中使用的机制如下;
import pandas as pd
def parse_file(file):
col_spec = [(0, 15), (16, 33), (34, 43), (44, 65)]
return pd.read_fwf(file, colspecs=col_spec)
但是,除了单词' Hexahedron'之外,它读取了一行的顶行和一行是空的。作为元素类型。
>>> data = parse_file("example.txt")
>>> data.head()
Element ID Element Type Result Jacobian Sign
0 NaN NaN NaN NaN
1 ============== ================ ======== ====================
2 1 Parabolic Warning 1.000000
3 NaN Hexahedron NaN NaN <= Extra record
4 2 Parabolic Warning 1.000000
从行中可以看出,前两行被捕获为2条记录(记录2和3)。我希望解析器将前两行捕获为一条记录,这样短语“抛物线六面体”就可以了。被捕获为元素类型。我怎么能这样做?
答案 0 :(得分:1)
一些后处理应该可以解决问题。以下是使用shift
运算符的一些代码。另请注意,不需要打开文件,只需将文件名传递给pd.read_fwf
。
import pandas as pd
col_spec = [(0, 15), (15, 32), (32, 42), (43, 65)]
df = pd.read_fwf("example.txt", colspecs=col_spec, comment="=")
# combine rows
df["combined"] = (df['Element Type'] + df['Element Type'].shift(-1)).where(df['Element ID'].notnull(), df['Element Type'] )
# remove extra rows
df = df[df['Element ID'].notnull()]
这应该给出一个如下所示的DataFrame:
Element ID Element Type Result Jacobian Sign combined
2 1 Parabolic Warning 1.000000 ParabolicHexahedron
4 2 Parabolic Warning 1.000000 ParabolicHexahedron
6 3 Parabolic Warning 1.000000 ParabolicHexahedron
8 4 Parabolic Warning 1.000000 ParabolicHexahedron