如何使用Python从文本文件中提取数据?

时间:2013-01-17 16:16:52

标签: python

以下是示例文字:

ACCESSION NUMBER:           0001054274-12-000001
CONFORMED SUBMISSION TYPE:  D
PUBLIC DOCUMENT COUNT:      1
ITEM INFORMATION:           Rule 506
FILED AS OF DATE:           20120301
DATE AS OF CHANGE:          20120301
EFFECTIVENESS DATE:         20120301

FILER:

COMPANY DATA:   
    COMPANY CONFORMED NAME:               Alliqua, Inc.
    CENTRAL INDEX KEY:                    0001054274
    STANDARD INDUSTRIAL CLASSIFICATION:   SURGICAL & MEDICAL INSTRUMENTS & APPARATUS [3841]
    IRS NUMBER:                           582349413
    STATE OF INCORPORATION:               FL
    FISCAL YEAR END:                      1220A

我正在尝试提取所有变量(登录号,合规提交类型,...,会计年度结束)并最终将它们写入.csv文件。有什么建议吗?

1 个答案:

答案 0 :(得分:3)

我将第一个:分开,并删除结果:

data = {}
with open(filename) as inputf:
    for line in inputf:
        if not ':' in line:
            continue
        label, value = map(str.strip, line.split(':', 1))
        if label and value:
            data[label] = value

输出以下映射:

{'ACCESSION NUMBER': '0001054274-12-000001',
 'CENTRAL INDEX KEY': '0001054274',
 'COMPANY CONFORMED NAME': 'Alliqua, Inc.',
 'CONFORMED SUBMISSION TYPE': 'D',
 'DATE AS OF CHANGE': '20120301',
 'EFFECTIVENESS DATE': '20120301',
 'FILED AS OF DATE': '20120301',
 'FISCAL YEAR END': '1220A',
 'IRS NUMBER': '582349413',
 'ITEM INFORMATION': 'Rule 506',
 'PUBLIC DOCUMENT COUNT': '1',
 'STANDARD INDUSTRIAL CLASSIFICATION': 'SURGICAL & MEDICAL INSTRUMENTS & APPARATUS [3841]',
 'STATE OF INCORPORATION': 'FL'}