在一个文本文件中,我有数千行具有这种通用格式,这里是前几个:
2 usfptotnap101a \vol\vol0 \vol\vol0 -2147184536 Different Security Type
2 usfptotnap101a \vol\vol0\etc \vol\vol0\etc -2147184538 Pruned. Different security type
2 usfptotnap101a \vol\ibd_tot101a_185282 \vol\ibd_tot101a_185282\Shared\MA_REGLL\Project Five 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
2 usfptotnap101a \vol\fi_psc101a_201792 \vol\fi_psc101a_201792\Shared\Global Markets Americas Supervisory Team-REPORTS\Development 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
2 usfptotnap101a \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\GL Test 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
2 usfptotnap101a \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\2013 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
2 usfptotnap101a \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\Interest\2013\December Interest 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
2 usfptotnap101a \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\Interest\2013\October Interest 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
2 usfptotnap101a \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\Interest\2013\November Interest 1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.
我的目标是收集每一行的第一条路径(\ vol \ vol0,\ vol \ vol0 \ etc,\ vol \ ibd_tot101a_185282等)以及每一行的最后一部分是错误信息(不同的安全类型,不同的安全类型,无法构建继承的访问控制列表(ACL)或访问控制条目(ACE)等。)
我正在考虑通过选项卡进行解析(注意:选项卡看起来像stackoverflow上的几个空格)在每个部分之间,但是对于前两行,例如错误编号后面没有选项卡,因此废墟那个计划。
另外,我需要在路径的字典中创建错误消息,包括两者组合的出现次数。例如:
{'\ vol \ ops_tot101a_185457':{'错误1':出现次数......,'错误2:出现次数......}
...其中错误1,错误2等可以是将每个错误映射到数字的错误消息的字典。
我无法找到任何符合我特定问题的解决方案,而且我对编码/ Python非常陌生,但如果您对任何可能有用的模块有任何想法/了解,请告诉我们!谢谢。
答案 0 :(得分:0)
你的截止日期已经过去了。然而,这是一个粗略的方法。
我在白色空间上分割每一行(不假设它们是标签),然后我自己查找数字,表示使用int(item)
的错误消息的开头(这确实涉及到这存在于每一行中。
你可以试试这个。如果它呱呱叫,那么你可以调整它,或者你可能不得不尝试更复杂的东西,如正则表达式或pyparsing或其他解析器。
with open('ruth.txt') as ruth:
for line in ruth:
items = line.rstrip().split()
for i, item in enumerate(items[2:]):
try:
test = int(item)
err = ' '.join(items[i+3:])
break
except:
pass
print(items[2], err)
结果:
\vol\vol0 Different Security Type
\vol\vol0\etc Pruned. Different security type
\vol\ibd_tot101a_185282 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\fi_psc101a_201792 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.