我正在尝试逐块(5个块)遍历文本文件,以捕获键值对中的数据(使用字典)。但是,只有最后一个块会被附加到字典列表中5次
我尝试在循环内移动字典初始化,但不起作用
jsonData = list()
blockIdentifier = set(chr(10))
filename = "test.txt"
counter = 0
lineId = None
currentJson = {}
for line in open(filename, 'r', encoding="utf8"):
'''print(line)
for ochar in line:
print(str(ochar)+" - "+str(ord(ochar))),
break'''
print(line)
if set(line).issubset(blockIdentifier):
jsonData.append(currentJson)
currentJson.clear()
counter += 1
else:
if ':' in line:
line = line.strip()
x = line.split(':', 1)
currentJson[x[0]] = x[1]
lineId = x[0]
elif line.startswith('/s/s/s'):
line = line.strip()
currentJson[lineId] += line
else:
pass
print(jsonData)
文本文件:
inetnum: 193.194.64.0 - 193.194.95.255
netname: DZ-ARN-970407
descr: PROVIDER
descr: Algerian Academic Research Network
country: DZ
org: ORG-AARN1-AFRINIC
admin-c: EG71
tech-c: EG71
status: ALLOCATED PA
remarks: data has been transferred from RIPE Whois Database 20050221
notify: ***@arn.dz
notify: ***@arn.dz
mnt-by: AFRINIC-HM-MNT
mnt-lower: AS16214-MNT
changed: ***@ripe.net 19970407
changed: ***@ripe.net 19981020
changed: ***@ripe.net 19990104
changed: ***@ripe.net 20000309
changed: ***@ripe.net 20000428
changed: ***@ripe.net 20020313
changed: ***@afrinic.net 20050205
changed: ***@afrinic.net 20121211
changed: ***@afrinic.net 20180212
changed: ***@afrinic.net 20180228
source: AFRINIC
inetnum: 193.95.0.0 - 193.95.127.255
netname: TN-ATI-20010402
descr: Agence Tunisienne Internet - ATI
descr: Provider Local Registry
country: TN
org: ORG-ATIA2-AFRINIC
admin-c: JF13-AFRINIC
tech-c: TG12-AFRINIC
status: ALLOCATED PA
remarks: Previously allocated to eu.eunet
remarks: data has been transferred from RIPE Whois Database 20050221
notify: ***@ati.tn
notify: ***@ati.tn
notify: ***@ati.tn
notify: ***@ati.tn
mnt-by: AFRINIC-HM-MNT
mnt-lower: ATI-MNT
mnt-domains: ATI-MNT
changed: ***@EU.net 19960208
changed: ***@ripe.net 19960513
changed: ***@EU.net 19990201
changed: ***@ripe.net 19990202
changed: ***@EU.net 19990204
changed: ***@ripe.net 20000420
changed: ***@ripe.net 20040226
changed: ***@afrinic.net 20050205
changed: ***@ripe.net 20050218
changed: ***@afrinic.net 20130611
changed: ***@afrinic.net 20161208
changed: ***@afrinic.net 20170214
source: AFRINIC
inetnum: 194.204.192.0 - 194.204.255.255
netname: ONPT
descr: Office National des Postes et Telecommunications
descr: aka Maroc Telecom
country: MA
admin-c: SM13-AFRINIC
tech-c: SM13-AFRINIC
org: ORG-ONdP1-AFRINIC
status: ALLOCATED PA
mnt-by: AFRINIC-HM-MNT
mnt-lower: ONPT-MNT
notify: ***@iam.net.ma
notify: ***@menara.ma
changed: ***@ripe.net 19960111
changed: ***@ripe.net 19980203
changed: ***@ripe.net 19990422
changed: ***@ripe.net 20030106
changed: ***@afrinic.net 20050205
changed: ***@afrinic.net 20060828
changed: ***@afrinic.net 20100118
changed: ***@afrinic.net 20100208
changed: ***@afrinic.net 20100609
changed: ***@afrinic.net 20110602
source: AFRINIC
inetnum: 194.79.96.0 - 194.79.127.255
netname: EG-IE-951129
descr: Internet Egypt Co.
country: EG
org: ORG-NO1-AFRINIC
admin-c: MM2370-AFRINIC
admin-c: IAM13-AFRINIC
tech-c: MM2370-AFRINIC
tech-c: IAM13-AFRINIC
status: ALLOCATED PA
notify: ***@etisalat.com
mnt-by: AFRINIC-HM-MNT
mnt-lower: AS5536-MNT
changed: ***@ripe.net 19951129
changed: ***@ripe.net 19980916
changed: ***@ripe.net 20020215
changed: ***@ripe.net 20020220
changed: ***@afrinic.net 20050205
changed: ***@afrinic.net 20111021
changed: ***@afrinic.net 20180215
source: AFRINIC
inetnum: 195.202.64.0 - 195.202.95.255
netname: MTN-Business
descr: MTN Business
country: KE
admin-c: NA34-AFRINIC
tech-c: NA34-AFRINIC
org: ORG-NOIS1-AFRINIC
status: ALLOCATED PA
mnt-by: AFRINIC-HM-MNT
mnt-lower: AS9129-MNT
remarks: data has been transferred from RIPE Whois Database 20050221
notify: ***@mtnbusiness.co.ke
notify: ***@mtnbusiness.co.ke
changed: ***@ripe.net 19970228
changed: ***@ripe.net 20020312
changed: ***@ripe.net 20020315
changed: ***@afrinic.net 20050205
changed: ***@afrinic.net 20120731
changed: ***@afrinic.net 20120801
changed: ***@afrinic.net 20140801
source: AFRINIC
inetnum: 195.24.192.0 - 195.24.223.255
netname: CM-CAMTEL-970403
descr: Data communication and international
descr: telecommunication of Cameroon
country: CM
org: ORG-IA6-AFRINIC
admin-c: NED2-AFRINIC
tech-c: JN1000-AFRINIC
tech-c: BLV1-AFRINIC
tech-c: TAJJ1-AFRINIC
status: ALLOCATED PA
notify: ***@camnet.cm
notify: ***@camnet.cm
notify: ***@camnet.cm
notify: ***@yahoo.com
mnt-by: AFRINIC-HM-MNT
mnt-lower: CAMTEL-MNT
mnt-routes: CAMTEL-MNT
changed: ***@afrinic.net 20060601
changed: ***@afrinic.net 20060602
changed: ***@afrinic.net 20121213
changed: ***@afrinic.net 20140918
source: AFRINIC
我想以键值对的形式获取文本,最好是在按块分组的字典列表中
答案 0 :(得分:0)
首先,我们忽略按块分组的概念,并尝试将每条有效行作为字典。我们将其存储在变量s
中。
filename = 'data.txt'
s = [ { x.split(':')[0] : x.split(':')[1].strip() } for x in open(filename).read().split('\n') if ':' in x ]
这将给出如下输出:
[{'inetnum': '193.194.64.0 - 193.194.95.255'}, {'netname': 'DZ-ARN-970407'}, {'descr': 'PROVIDER'}, {'descr': 'Algerian Academic Research Network'}, {'country': 'DZ'}, {'org': 'ORG-AARN1-AFRINIC'}, {'admin-c': 'EG71'}, {'tech-c': 'EG71'}, {'status': 'ALLOCATED PA'}, {'remarks': 'data has been transferred from RIPE Whois Database 20050221'}, {'notify': '***@arn.dz'}, {'notify': '***@arn.dz'}, {'mnt-by': 'AFRINIC-HM-MNT'}, {'mnt-lower': 'AS16214-MNT'}, {'changed': '***@ripe.net 19970407'}, {'changed': '***@ripe.net 19981020'}, {'changed': '***@ripe.net 19990104'}, {'changed': '***@ripe.net 20000309'}, {'changed': '***@ripe.net 20000428'}, {'changed': '***@ripe.net 20020313'}, {'changed': '***@afrinic.net 20050205'}, {'changed': '***@afrinic.net 20121211'}, {'changed': '***@afrinic.net 20180212'}, {'changed': '***@afrinic.net 20180228'}, {'source': 'AFRINIC'}, {'inetnum': '193.95.0.0 - 193.95.127.255'}, ... ]
现在,我们找出分隔符(即没有“:”的行)的放置位置。我们将其存储在变量t
中。
t = ''.join([ "1" if ':' in x else "0" for x in open(filename).read().split('\n')]).strip("0")
这将给出一个字符串,其中“ 1”代表法线,“ 0”代表分隔符。
'111111111111111111111111101111111111111111111111111111111011111111111111111111111101111111111111111111110111111111111111111111011111111111111111111111'
接下来,我们根据存储在t
中的模式在字典列表中添加“ 0”分隔符。我们将其称为w
。
w = [ s.pop(0) if int(x) else "0" for x in t ]
现在我们有这样的东西:
[ ... {'changed': '***@afrinic.net 20170214'}, {'source': 'AFRINIC'}, '0', {'inetnum': '194.204.192.0 - 194.204.255.255'}, {'netname': 'ONPT'}, {'descr': 'Office National des Postes et Telecommunications'}, {'descr': 'aka Maroc Telecom'}, {'country': 'MA'}, ... ]
因此,我们可以找到“ 0”字符串的索引,并使用它们将此列表拆分为列表列表。我们还将使用过滤器从结果中删除“ 0”分隔符。
indices = [ i for (i,x) in enumerate(w) if x == '0' ]
right = indices + [len(w)]
left = [0] + indices
result = [ list(filter(lambda x: x != "0", w[start:end])) for (start,end) in zip(left,right)]
所以现在result
有了我们想要的输出,正如我们通过运行import pprint; pprint.PrettyPrinter().pprint(result)
可以看到的:
[[{'inetnum': '193.194.64.0 - 193.194.95.255'},
{'netname': 'DZ-ARN-970407'},
{'descr': 'PROVIDER'},
{'descr': 'Algerian Academic Research Network'},
{'country': 'DZ'},
{'org': 'ORG-AARN1-AFRINIC'},
{'admin-c': 'EG71'},
{'tech-c': 'EG71'},
{'status': 'ALLOCATED PA'},
{'remarks': 'data has been transferred from RIPE Whois Database 20050221'},
{'notify': '***@arn.dz'},
{'notify': '***@arn.dz'},
{'mnt-by': 'AFRINIC-HM-MNT'},
{'mnt-lower': 'AS16214-MNT'},
{'changed': '***@ripe.net 19970407'},
{'changed': '***@ripe.net 19981020'},
{'changed': '***@ripe.net 19990104'},
{'changed': '***@ripe.net 20000309'},
{'changed': '***@ripe.net 20000428'},
{'changed': '***@ripe.net 20020313'},
{'changed': '***@afrinic.net 20050205'},
{'changed': '***@afrinic.net 20121211'},
{'changed': '***@afrinic.net 20180212'},
{'changed': '***@afrinic.net 20180228'},
{'source': 'AFRINIC'}],
[{'inetnum': '193.95.0.0 - 193.95.127.255'},
{'netname': 'TN-ATI-20010402'},
{'descr': 'Agence Tunisienne Internet - ATI'},
{'descr': 'Provider Local Registry'},
{'country': 'TN'},
{'org': 'ORG-ATIA2-AFRINIC'},
{'admin-c': 'JF13-AFRINIC'},
{'tech-c': 'TG12-AFRINIC'},
{'status': 'ALLOCATED PA'},
{'remarks': 'Previously allocated to eu.eunet'},
{'remarks': 'data has been transferred from RIPE Whois Database 20050221'},
{'notify': '***@ati.tn'},
{'notify': '***@ati.tn'},
{'notify': '***@ati.tn'},
{'notify': '***@ati.tn'},
{'mnt-by': 'AFRINIC-HM-MNT'},
{'mnt-lower': 'ATI-MNT'},
{'mnt-domains': 'ATI-MNT'},
{'changed': '***@EU.net 19960208'},
{'changed': '***@ripe.net 19960513'},
{'changed': '***@EU.net 19990201'},
{'changed': '***@ripe.net 19990202'},
{'changed': '***@EU.net 19990204'},
{'changed': '***@ripe.net 20000420'},
{'changed': '***@ripe.net 20040226'},
{'changed': '***@afrinic.net 20050205'},
{'changed': '***@ripe.net 20050218'},
{'changed': '***@afrinic.net 20130611'},
{'changed': '***@afrinic.net 20161208'},
{'changed': '***@afrinic.net 20170214'},
{'source': 'AFRINIC'}],
[{'inetnum': '194.204.192.0 - 194.204.255.255'},
{'netname': 'ONPT'},
{'descr': 'Office National des Postes et Telecommunications'},
{'descr': 'aka Maroc Telecom'},
{'country': 'MA'},
{'admin-c': 'SM13-AFRINIC'},
{'tech-c': 'SM13-AFRINIC'},
{'org': 'ORG-ONdP1-AFRINIC'},
{'status': 'ALLOCATED PA'},
{'mnt-by': 'AFRINIC-HM-MNT'},
{'mnt-lower': 'ONPT-MNT'},
{'notify': '***@iam.net.ma'},
{'notify': '***@menara.ma'},
{'changed': '***@ripe.net 19960111'},
{'changed': '***@ripe.net 19980203'},
{'changed': '***@ripe.net 19990422'},
{'changed': '***@ripe.net 20030106'},
{'changed': '***@afrinic.net 20050205'},
{'changed': '***@afrinic.net 20060828'},
{'changed': '***@afrinic.net 20100118'},
{'changed': '***@afrinic.net 20100208'},
{'changed': '***@afrinic.net 20100609'},
{'changed': '***@afrinic.net 20110602'},
{'source': 'AFRINIC'}],
[{'inetnum': '194.79.96.0 - 194.79.127.255'},
{'netname': 'EG-IE-951129'},
{'descr': 'Internet Egypt Co.'},
{'country': 'EG'},
{'org': 'ORG-NO1-AFRINIC'},
{'admin-c': 'MM2370-AFRINIC'},
{'admin-c': 'IAM13-AFRINIC'},
{'tech-c': 'MM2370-AFRINIC'},
{'tech-c': 'IAM13-AFRINIC'},
{'status': 'ALLOCATED PA'},
{'notify': '***@etisalat.com'},
{'mnt-by': 'AFRINIC-HM-MNT'},
{'mnt-lower': 'AS5536-MNT'},
{'changed': '***@ripe.net 19951129'},
{'changed': '***@ripe.net 19980916'},
{'changed': '***@ripe.net 20020215'},
{'changed': '***@ripe.net 20020220'},
{'changed': '***@afrinic.net 20050205'},
{'changed': '***@afrinic.net 20111021'},
{'changed': '***@afrinic.net 20180215'},
{'source': 'AFRINIC'}],
[{'inetnum': '195.202.64.0 - 195.202.95.255'},
{'netname': 'MTN-Business'},
{'descr': 'MTN Business'},
{'country': 'KE'},
{'admin-c': 'NA34-AFRINIC'},
{'tech-c': 'NA34-AFRINIC'},
{'org': 'ORG-NOIS1-AFRINIC'},
{'status': 'ALLOCATED PA'},
{'mnt-by': 'AFRINIC-HM-MNT'},
{'mnt-lower': 'AS9129-MNT'},
{'remarks': 'data has been transferred from RIPE Whois Database 20050221'},
{'notify': '***@mtnbusiness.co.ke'},
{'notify': '***@mtnbusiness.co.ke'},
{'changed': '***@ripe.net 19970228'},
{'changed': '***@ripe.net 20020312'},
{'changed': '***@ripe.net 20020315'},
{'changed': '***@afrinic.net 20050205'},
{'changed': '***@afrinic.net 20120731'},
{'changed': '***@afrinic.net 20120801'},
{'changed': '***@afrinic.net 20140801'},
{'source': 'AFRINIC'}],
[{'inetnum': '195.24.192.0 - 195.24.223.255'},
{'netname': 'CM-CAMTEL-970403'},
{'descr': 'Data communication and international'},
{'descr': 'telecommunication of Cameroon'},
{'country': 'CM'},
{'org': 'ORG-IA6-AFRINIC'},
{'admin-c': 'NED2-AFRINIC'},
{'tech-c': 'JN1000-AFRINIC'},
{'tech-c': 'BLV1-AFRINIC'},
{'tech-c': 'TAJJ1-AFRINIC'},
{'status': 'ALLOCATED PA'},
{'notify': '***@camnet.cm'},
{'notify': '***@camnet.cm'},
{'notify': '***@camnet.cm'},
{'notify': '***@yahoo.com'},
{'mnt-by': 'AFRINIC-HM-MNT'},
{'mnt-lower': 'CAMTEL-MNT'},
{'mnt-routes': 'CAMTEL-MNT'},
{'changed': '***@afrinic.net 20060601'},
{'changed': '***@afrinic.net 20060602'},
{'changed': '***@afrinic.net 20121213'},
{'changed': '***@afrinic.net 20140918'},
{'source': 'AFRINIC'}]]
这是我们编写的所有代码:
filename = 'data.txt'
s = [ { x.split(':')[0] : x.split(':')[1].strip() } \
for x in open(filename).read().split('\n') if ':' in x ]
t = ''.join([ "1" if ':' in x else "0" \
for x in open(filename).read().split('\n')]).strip("0")
w = [ s.pop(0) if int(x) else "0" \
for x in t ]
indices = [ i for (i,x) in enumerate(w) if x == '0' ]
right = indices + [len(w)]
left = [0] + indices
result = [ list(filter(lambda x: x != "0", w[start:end])) \
for (start,end) in zip(left,right)]