Question

我正在尝试从平面文件中读取一些数据并使用Python将其显示在其他应用程序上。我的平面文件有12,000行，我不需要所有的数据。我需要解析一些数据。我在平面文件上的内容是12,000行。除了其他数据之外，一大块行具有00，而除了行中的其他数据之外，另一个块具有10个。我想要做的是解析其中包含10行的所有行，并且只包括那些有00的行。

以下是更新的示例文件。我想解析所有10行。另外它只是一个样本，我的实际平面文件是12,000行。

我刚刚更新了我的平面文件。在这里，我只想在开头读取$和在$和00之后的LOB之后的LOB和＆amp ;.我想解析平面文件中的所有其他内容。

$90TM020516 19002200&
$90LOB  0   0   0   7 10  &
$90LOB 25   0   0   6 10  &
$90LOB 57   0   0   6 10  &
$90LOB353   0   0   5 10  &
$90LOB 36   0   0   5 10  &
$90GPSA8   0   38281168  -77448376&
$90LOB276   0   0   5 10  &
$90LOB185   0   0   6 10  &
$90LOB197   0   0   6 00  &
$90LOB198   0 254   6 00  &
$90LOB197   0 254   6 00  &
RSSI $90LOB201   0 254   5 00  &
$90TM020516 19002300&
$90LOB194   0 254   5 00  &
$90LOB190   0 254   5 00  &
$90LOB185   0 254   5 00  &
$90LOB181   0 254   5 00  &
$90LOB187   0 254   5 00  &
$90LOB192   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB191   0 254   5 00  &
$90LOB184   0 254   5 00  &
$90LOB177   0 254   5 00  &

以下是我用于读取数据的代码

  for line in lines:
        if (line[0] == '$'):
         if (line[3:6] == 'LOB'):
            if (line[22:24]=='00'):

如果你愿意，我可以发给你整个平面文件。它只是文件的摘录。

Answer 1

如果我理解你的问题（而且我不确定我这样做），你的文件行如下所示：

@45   0 0   5 10  *
@45   0 0   5 10  *
@45   0 0   5 10  *
@45   0 0   6 10  *
@45   0 0   6 00  *
@45   0 0   6 00  *
@45   0 0   6 00  *
@45   0 0   5 00  *

...而且你只想读取有00的行并忽略那些有10的行。

以下是完成此操作的代码示例：

# List containing all the lines you want to save
lines_you_want = []

# Open the file with 12,000 lines
with open('some.file', 'rb') as infile:

    # Check if each line starts with 00
    for line in infile:

        #  Check if the 15th character is a '0' instead of a '1'
        if (line[15] == '0'):
            lines_you_want.append(line)

# Do something with lines_you_want

这假设00或10总是在文件中的相同位置（字符15和16），并且这两个是唯一可能存在的东西（即不是01,11,12,29或其他），否则你将不得不改变它。

根据您的应用程序，您可以选择对行进行操作，而不是创建列表。两种方式都有效。

如果我做了错误的假设请发表评论，我会编辑我的答案。

Answer 2

import re
filename = <path to file>
lines = [line.strip() for line in open(filename) if re.match(r'^\$.*LOB.*00  &$', line)]

A regex101 example

正则表达式解释说：

^表示行的开头。字符值$在行开始后立即出现。任何数量的字符都可以跟随，直到解析器到达LOB。 00再次发生同样的情况。如果那些字符串不存在，那么它就不会为该行的正则表达式返回true。

因此，最终结果是$在开始时LOB在$和 00 at the end before＆amp;`之后。它将解析文件中的其他所有内容。

它存储为字符串列表，每个字符串代表一行。

加成：如果要将其输出到另一个文件，则可以执行以下操作：

import re
with open("FOO", 'w') as outfile, open('BAR', 'r') as infile:
    for line in infile:
        if re.match(r'^\$.*LOB.*00  &$', line):
            outfile.write(line)

这会产生：

$90LOB197   0   0   6 00  &
$90LOB198   0 254   6 00  &
$90LOB197   0 254   6 00  &
$90LOB194   0 254   5 00  &
$90LOB190   0 254   5 00  &
$90LOB185   0 254   5 00  &
$90LOB181   0 254   5 00  &
$90LOB187   0 254   5 00  &
$90LOB192   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB191   0 254   5 00  &
$90LOB184   0 254   5 00  &
$90LOB177   0 254   5 00  &

来自您的样本数据。

从平面文件中解析数据

2 个答案: