Question

我想将特定的ascii文件转换为csv。这个ascii文件有自己的规范，我在相关片段下面发布。
A行以代码66TP开头：

66TP        1003    54.437269600149717.012388003107655.5139691177756                :10.008677993245250.01231534739191

B行以C6NM开头：

C6NM0821.565823793411260.900167346000671.2812114953994820.81007696688170033                                1679475490.0000000001679475527.0000000001

如上所述，个别价值不是分开的，而是它们的区别按位置和长度排列。

A行规格：

1 Position   Length Data format Description of field

2   1           2   Type code   Record type code = 66
3   3           2   Derivation  Derivation code
4   5           16  Name        Point name
5   21          16  Latitude    Latitude
6   37          16  Longitude   Longitude
7   53          16  Distance    WGS84 ellipsoidal height at APC
8   69          16  Text 16     Feature code
9   85          1   GPS Method  Measurement method
10  86          1   Classification  Classification of the point
11  87          16  Distance    Horizontal precision
12  103         16  Distance    Vertical precision

B行的规格：

1   Position Length Data format Description of field
2   1           2   Type code   Record type code = C6
3   3           2   Derivation  Derivation code
4   5           2   Integer 2   Minimum number of satellites
5   7           1   Boolean     Relative DOPs
6   8               16  Scalar  PDOP (maximum)
7   24          16  Scalar  HDOP (maximum)
8   40          16  Scalar  VDOP (maximum)
9   56          16  Scalar  RMS
10  72          4   Integer 4   Number of GPS positions used
11  76          16  Distance    Horizontal standard deviation
12  92          16  Distance    Vertical standard deviation
13  108         4   Integer 4   Start GPS week
14  112         16  Scalar  Start GPS time in seconds to 3dp
15  128         4   Integer 4   End GPS Week
16  132         16  Scalar  End GPS time in seconds to 3dp
17  148         1   Monitor Status

我想要的输出是合并两行，就像：

1003,54.4372696001497,17.0123880031076,55.5139691177756,0.009,0.012,8,1.6,0.9,1.28,20.8,033,1679,475490.0,1679,475527.0

这里是输入文件，我用方括号标记了各个值：

66TP        [1003]    [54.4372696001497][17.0123880031076][55.5139691177756]                :1[0.00867799324525][0.01231534739191]

C6NM[082][1.56582379341126][0.90016734600067][1.28121149539948][20.81007696688170][033]                                [1679][475490.000000000][1679][475527.0000000001]

很抱歉很长的帖子，但我不知道怎么能用更短的方式描述它。我是业余初学者程序员，我想问你任何让我开始的提示处理此类数据。

Answer 1

由于您知道每行上每个元素的位置，因此请使用string slice来抓取每个元素。

例如，

type_code = linea[0:2]
(derivation, name) = (linea[2:4], linea[4:20])

为了更进一步，你可以编写一个小函数来分割一条线，给出一条线的长度列表。

代码

def split_string_by_position(a_string, lengths):
    result = []
    position = 0
    for length in lengths:
        result.append(a_string[position:position+length])
        position = position+length
    return result


line = '66TP        1003    54.437269600149717.012388003107655.5139691177756'
lengths = [2, 2, 16, 16, 16, 16, 16, 16, 1, 1, 16, 16]

print(split_string_by_position(line, lengths))

输出

['66', 'TP', '        1003    ', '54.4372696001497', '17.0123880031076', '55.5139691177756', '', '', '', '', '', '']

这只返回数据元素的列表。您可以通过提供变量名称以及每个长度（[[2,'type'], [2,'derivation'],...]）来更进一步，并稍微更改一下函数，以便返回dict，这样您就可以使用{{the_result['variable_name']来访问它。 1}}

一些想法。 http://learnpythonthehardway.org/对你来说是一件好事，所以你要学习语言的基础知识。

Answer 2

看起来这有空格分隔，在这种情况下，您可以忽略列号，只需使用line.split（）来获取字段列表。

我的to-table程序可能也有帮助： http://stromberg.dnsalias.org/~strombrg/to-table.html

Answer 3

我认为使用生成器编写它是有益的（并且可能对某些人有用）。

首先是代码：

  1 #!/usr/bin/env python
  2
  3 def parser(str,len):
  4     '''Generate parsed chunks from str based on a list of lengths, len'''
  5     position = 0
  6     for l in len:
  7         yield str[position:position+l]
  8         position = position + l
  9
 10 line = '66TP        1003    54.437269600149717.012388003107655.5139691177756'
 11 lengths = [2, 2, 16, 16, 16, 16, 16, 16, 1, 1, 16, 16]
 12
 13 lines = [ chunk for chunk in parser(line, lengths) ]
 14 print lines

现在，您可以在任何可以使用迭代器的地方使用解析器;比如我在第13行如何使用它将所有字符串推入名为lines的列表中。

您还可以通过有趣的方式更改生成器，例如将.strip（）添加到第7行的末尾。现在，您的字段从每个字段的前后都删除了空格。

按照描述修改第7行：

        yield str[position:position+l].strip()

您现在可以获得已修改的输出：

['66', 'TP', '1003', '54.4372696001497', '17.0123880031076', '55.5139691177756', '', '', '', '', '', '']

ascii文件使用python基于位置和长度进行解析

3 个答案:

代码

输出