Python文本处理/查找数据

时间:2017-06-20 16:08:07

标签: python parsing text

我正在尝试使用Python从文本文件中解析/处理一些信息。此文件包含名称,员工编号和其他数据。我提前不知道姓名或员工人数。我知道在名称后面有文字:“Per End”,在员工编号之前有文字:“File:”。我可以使用.find()方法找到这些项目。但是,如何让Python查看“Per End”和“File:”之前或之后的信息?在这种特定情况下,输出应该是名称和员工编号。

文字如下:

SMITH, John
Per End: 12/10/2016
File:
002013
Dept:
000400
Rate:10384 60

我的代码是:

file = open("Register.txt", "rt")
lines = file.readlines()
file.close()

countPer = 0
for line in lines:
    line = line.strip()
    print (line)
    if line.find('Per End') != -1:
        countPer += 1
print ("Per End #'s: ", countPer)

2 个答案:

答案 0 :(得分:1)

file = open("Register.txt", "rt")
lines = file.readlines()
file.close()

for indx, line in enumerate(lines):
    line = line.strip()
    print (line)
    if line.find('Per End') != -1:
        print lines[indx-1].strip()
    if line.find('File:') != -1:
        print lines[indx+1].strip()

枚举(行)也可以访问索引和行,您也可以访问上一行和下一行

这是我的stdout直接在python shell中运行:

>>> file = open("r.txt", "rt")
>>> lines  = file.readlines()
>>> file.close()
>>> lines
['SMITH, John\n', 'Per End: 12/10/2016\n', 'File:\n', '002013\n', 'Dept:\n', '000400\n', 'Rate:10384 60\n']

>>> for indx, line in enumerate(lines):
...     line = line.strip()
...     if line.find('Per End') != -1:
...        print lines[indx-1].strip()
...     if line.find('File:') != -1:
...        print lines[indx+1].strip()

SMITH, John
002013

答案 1 :(得分:0)

我会这样做。

首先,一些测试数据。

test = """SMITH, John\n
Per End: 12/10/2016\n
File:\n
002013\n
Dept:\n
000400\n
Rate:10384 60\n"""

text = [line for line in test.splitlines(keepends=False) if line != ""]

现在回答真实的问题。

count_per, count_num = 0, 0

在iterable上使用enumerate会自动为您提供索引。

for idx, line in enumerate(text):

    # Just test whether what you're looking for is in the `str`

    if 'Per End' in line:
        print(text[idx - 1]) # access the full set of lines with idx
        count_per += 1
    if 'File:' in line:
        print(text[idx + 1])
        count_num += 1

print("Per Ends = {}".format(count_per))
print("Files = {}".format(count_num))

我的收益率:

SMITH, John
002013
Per Ends = 1
Files = 1