Question

我有一个Fortran格式的文本文件（这里是第3行）：

00033+3251 A   B       C?      6.96    5.480" 358  9.12 F0V    0.00        2.28s  1.00: 2MASS, dJ=1.3
00033+3251 Aa  Ab  Aab S1,E    0.62    0.273m   0  9.28 F0V   11.28 K2     1.68*  0.32* SB 1469
00033+3251 Aab Ac  A   E*      4.26    0.076"   0  9.12 F0V    0.00        2.00s  0.28* 2008MNRAS.383.1506

和文件格式说明：

--------------------------------------------------------------------------------
Bytes Format Units   Label     Explanations
--------------------------------------------------------------------------------
 1- 10  A10   ---     WDS       WDS(J2000)
12- 14  A3    ---     Primary   Designation of the primary
16- 18  A3    ---     Secondary Designation of the secondary component
20- 22  A3    ---     Parent    Designation of the parent (1)
24- 29  A6    ---     Type      Observing technique/status (2)
31- 35  F5.2  d       logP      ? Logarithm (10) of period in days
37- 44  F8.3  ---     Sep       Separation or axis
    45  A1    ---     x_Sep     ['"m] Units of sep. (',",m)
47- 49  I3    deg     PA        Position angle
51- 55  F5.2  mag     Vmag1     V-magnitude of the primary
57- 61  A5    ---     SP1       Spectral type of the primary
63- 67  F5.2  mag     Vmag2     V-magnitude of the secondary
69- 73  A5    ---     SP2       Spectral type of the secondary
75- 79  F5.2  solMass Mass1     Mass of the primary
    80  A1    ---     MCode1    Mass estimation code for primary (3)
82- 86  F5.2  solMass Mass2     Mass of the secondary
    87  A1    ---     MCode2    Mass estimation code for secondary (3)
89-108  A20   ---     Rem       Remark

如何在Python中读取我的文件。我在read_fwf库中找到了pandas函数。

import pandas as pd

filename = 'systems'
columns = ((0,10),(11,14),(15,18),(19,22),(23,29),(30,35),(36,44),(45,45),(46,49),(50,55),(56,61),(62,67),(68,73),(74,79),(80,80),(81,86),(87,87),(88,108))
data = pd.read_fwf(filename, colspecs = columns, header=None)

这是唯一可行且有效的方法吗？我希望我能在没有pandas的情况下做到这一点。你有什么建议吗？

Answer 1

     columns = ((0,10),(11,14),(15,18),(19,22),(23,29),(30,35),
               (36,44),(44,45),(46,49),(50,55),(56,61),(62,67),
               (68,73),(74,79),(79,80),(81,86),(86,87),(88,108))
     string=file.readline()
     dataline = [ string[c[0]:c[1]] for c in columns ]

注意列索引是（startbyte-1，endbyte），因此单个字符字段是例如：（44,45）

这会给你一个字符串列表。你可能想要转换为浮点数，整数等。这个主题有很多问题。

Answer 2

可以使用astropy表读取此类型的文件。您显示的标题看起来很像CDS格式的ascii表，它具有为其实现的特定读取器：

http://astropy.readthedocs.org/en/latest/api/astropy.io.ascii.Cds.html#astropy.io.ascii.Cds

Answer 3

有一个模块FortranRecordReader但它与现代fortran文件包含的星号，注释等相比较弱。不过，对于一个不错的文件，它与namedtuple结合使用很有用。例如：

from fortranformat import FortranRecordReader
fline=FortranRecordReader('(a1,i3,i5,i5,i5,1x,a3,a4,1x,f13.5,f11.5,f11.3,f9.3,1x,a2,f11.3,f9.3,1x,i3,1x,f12.5,f11.5)')
from collections import namedtuple
record=namedtuple('nucleo','cc NZ  N  Z  A    el  o     massexcess  uncmassex binding uncbind     B  beta  uncbeta    am_int am_float   uncatmass')

f=open('AME2012.mas12.ff','r')
for line in f:
   nucl=record._make(fline.read(line))

你也可以尝试模块“解析”，或写你的

如何在Python中阅读Fortran固定宽度格式的文本文件？

3 个答案: