如何使用Python解析文本文件以检索特定值?

时间:2018-02-16 19:46:45

标签: python parsing

在一个文本文件中,我有数千行具有这种通用格式,这里是前几个:

2   usfptotnap101a  \vol\vol0   \vol\vol0   -2147184536 Different Security Type
2   usfptotnap101a  \vol\vol0\etc   \vol\vol0\etc   -2147184538 Pruned. Different security type
2   usfptotnap101a  \vol\ibd_tot101a_185282 \vol\ibd_tot101a_185282\Shared\MA_REGLL\Project Five    1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.
2   usfptotnap101a  \vol\fi_psc101a_201792  \vol\fi_psc101a_201792\Shared\Global Markets Americas Supervisory Team-REPORTS\Development  1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.
2   usfptotnap101a  \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\GL Test 1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.
2   usfptotnap101a  \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\2013    1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.
2   usfptotnap101a  \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\Interest\2013\December Interest 1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.
2   usfptotnap101a  \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\Interest\2013\October Interest  1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.
2   usfptotnap101a  \vol\ops_tot101a_185457 \vol\ops_tot101a_185457\CollateralMgmt\collateral management\PIMCO TBA_TEST\Interest\2013\November Interest 1340    The inherited access control list (ACL) or access control entry (ACE) could not be built.

我的目标是收集每一行的第一条路径(\ vol \ vol0,\ vol \ vol0 \ etc,\ vol \ ibd_tot101a_185282等)以及每一行的最后一部分是错误信息(不同的安全类型,不同的安全类型,无法构建继承的访问控制列表(ACL)或访问控制条目(ACE)等。)

我正在考虑通过选项卡进行解析(注意:选项卡看起来像stackoverflow上的几个空格)在每个部分之间,但是对于前两行,例如错误编号后面没有选项卡,因此废墟那个计划。

另外,我需要在路径的字典中创建错误消息,包括两者组合的出现次数。例如:

{'\ vol \ ops_tot101a_185457':{'错误1':出现次数......,'错误2:出现次数......}

...其中错误1,错误2等可以是将每个错误映射到数字的错误消息的字典。

我无法找到任何符合我特定问题的解决方案,而且我对编码/ Python非常陌生,但如果您对任何可能有用的模块有任何想法/了解,请告诉我们!谢谢。

1 个答案:

答案 0 :(得分:0)

你的截止日期已经过去了。然而,这是一个粗略的方法。

我在白色空间上分割每一行(不假设它们是标签),然后我自己查找数字,表示使用int(item)的错误消息的开头(这确实涉及到这存在于每一行中。

你可以试试这个。如果它呱呱叫,那么你可以调整它,或者你可能不得不尝试更复杂的东西,如正则表达式或pyparsing或其他解析器。

with open('ruth.txt') as ruth:
    for line in ruth:
        items = line.rstrip().split()
        for i, item in enumerate(items[2:]):
            try:
                test = int(item)
                err = ' '.join(items[i+3:])
                break
            except:
                pass
        print(items[2], err)

结果:

\vol\vol0 Different Security Type
\vol\vol0\etc Pruned. Different security type
\vol\ibd_tot101a_185282 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\fi_psc101a_201792 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.
\vol\ops_tot101a_185457 The inherited access control list (ACL) or access control entry (ACE) could not be built.