我有一个用于解析文本文件的脚本
该脚本中有一个While循环,因为它们可能是多个下一行。
我当前的脚本遇到了跳线问题。我很确定它与我使用“next()”及其位置有关,但我无法弄明白。
这是文本文件的示例:
object-group network TestNetwork1
description TestDescription
network-object host TestHost
network-object host TestHost
network-object host TestHost
network-object host TestHost
object-group network TestNetwork2
description TestDescription
network-object host TestHost
object-group network TestNetwork3
description TestDescription
network-object host TestHost
object-group network TestNetwork4
description TestDescription
network-object host TestHost
object-group network TestNetwork5
description TestDescription
network-object host TestHost
object-group network TestNetwork6
description TestDescription
network-object host TestHost
object-group network TestNetwork7
description TestDescription
network-object host TestHost
object-group network TestNetwork8
description TestDescription
network-object host TestHost
object-group network TestNetwork9
description TestDescription
network-object host TestHost
object-group network TestNetwork10s
description TestDescription
network-object host TestHost
这是脚本:
import csv
Count = 0
objects = open("test-object-groups.txt", 'r+')
iobjects = iter(objects)
with open('object-group-test.csv', 'wb+') as filename2:
writer2 = csv.writer(filename2)
for lines in iobjects:
if lines.startswith("object-group network"):
print lines
Count += 1
linesplit = lines.split()
writer2.writerow([linesplit[2]])
while True:
nextline = str(next(iobjects))
if nextline.startswith(" network-object") or nextline.startswith(" description"):
nextlinesplit = nextline.split()
if nextlinesplit[1] <> "host" and nextlinesplit[1] <> "object" and nextlinesplit[0] <> "description":
writer2.writerow(['','subnet', nextlinesplit[1], nextlinesplit[2]])
elif nextlinesplit[1] == "host":
writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
elif nextlinesplit[1] == "object":
writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
elif nextlinesplit[0] == "description":
writer2.writerow(['',nextlinesplit[0]])
elif nextline.startswith("object-group"):
break
print Count
以下输出显示它正在跳过行:
object-group network TestNetwork1
object-group network TestNetwork3
object-group network TestNetwork5
object-group network TestNetwork7
object-group network TestNetwork9
5
如上所示,订单项正在跳过 知道如何解决这个问题吗?
答案 0 :(得分:2)
for lines in iobjects:
...
...
while True:
nextline = str(next(iobjects))
当然会跳过一条线。您在迭代next(iobjects)
时调用iobjects
,因此下一行被消耗,而不是由for
循环处理。
考虑这个文件:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
这段代码:
with open('test.txt') as f:
for line in f:
print(line)
if int(line.strip()) % 2 == 0:
next(f)
输出结果为:
1
2
4
6
8
10
12
14
如果数字是偶数,我们会调用next
,因此每隔一行就会丢失。
建议的解决方案:
使用itertools.tee
创建2个不同的生成器。可能是最不直接的解决方案。
使用f.readlines()
并操作文件中的行列表而不是迭代器。这样您就可以使用索引。
使用创建“peekable”迭代器的more-itertools
包:https://stackoverflow.com/a/27698681/1453822
不要逐行解析文件。使用正则表达式逐块提取文件中的信息。例如,正则表达式r'(object-group.*?)(?=$|object-group)'
会这样做。 (我确信这远不是最优的正则表达式)。确保使用re.DOTALL
标志。
import re
with open('test.txt') as f:
file_content = f.read()
for group in re.findall(r'(object-group.*?)(?=$|object-group)', file_content, re.DOTALL):
print(group)
# object-group network TestNetwork1
# description TestDescription
# network-object host TestHost
# network-object host TestHost
# network-object host TestHost
# network-object host TestHost
#
# object-group network TestNetwork2
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork3
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork4
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork5
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork6
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork7
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork8
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork9
# description TestDescription
# network-object host TestHost
#
# object-group network TestNetwork10s
# description TestDescription
# network-object host TestHost
作为旁注,iobjects = iter(objects)
是多余的。 open
已经返回迭代器。