如何使用Python确定CSV文件中每行的列数?

时间:2019-07-06 13:49:54

标签: python-3.x beautifulsoup xml-parsing export-to-csv

我正在分析有关内部交易的xml结构文本文件。我写了一些代码来解析XML结构并将输出写到CSV文件中。文件的结果每行写入一次,分析的信息则写入单独的列中。但是在某些文件中,信息会多次出现,并且我的代码会覆盖单元格中的信息,最后,我的CSV文件的单元格中只有一个日期。

import csv
import glob
import re
import string
import time
import bs4 as bs


# User defined directory for files to be parsed
TARGET_FILES = r'D:\files\'
# User defined file pointer to LM dictionary

# User defined output file
OUTPUT_FILE =  r'D:\ouput\Parser.csv'
# Setup output
OUTPUT_FIELDS = [r'Datei', 'transactionDate', r'transactionsCode', r'Director', r'Officer', r'Titel', r'10-% Eigner', r'sonstiges', r'SignatureDate']


def main():

    f_out = open(OUTPUT_FILE, 'w')
    wr = csv.writer(f_out, lineterminator='\n', delimiter=';')
    wr.writerow(OUTPUT_FIELDS)

    file_list = glob.glob(TARGET_FILES)
    for file in file_list:
        print(file)
        with open(file, 'r', encoding='UTF-8', errors='ignore') as f_in:
            soup = bs.BeautifulSoup(f_in, 'xml')

        output_data = get_data(soup)
        output_data[0] = file                       
        wr.writerow(output_data)


def get_data(soup):

# overrides the transactionDate if more than one transactions disclosed on the current form
# the number determine the column for the output


    _odata = [0] * 9

    try:
        for item in soup.find_all('transactionDate'):
            _odata[1] = item.find('value').text               
    except AttributeError:
        _odata[1] = ('keine Angabe')
    try:
        for item in soup.find_all('transactionAcquiredDisposedCode'):
            _odata[2] = item.find('value').text
    except AttributeError:
        _odata[2] = 'ka'
    for item in soup.find_all('reportingOwnerRelationship'):
        try:
            _odata[3] = item.find('isDirector').text
        except AttributeError:
            _odata[3] = ('ka')
        try:
            _odata[4] = item.find('isOfficer').text
        except AttributeError:
            _odata[4] = ('ka')
        try:
            _odata[5] = item.find('officerTitle').text
        except AttributeError:
            _odata[5] = 'ka'
        try:
            _odata[6] = item.find('isTenPercentOwner').text
        except AttributeError:
            _odata[6] = ('ka')
        try:
            _odata[7] = item.find('isOther').text
        except AttributeError:
            _odata[7] = ('ka')
        try:
            for item in soup.find_all('ownerSignature'):
                _odata[8] = item.find('signatureDate').text
        except AttributeError:
            _odata[8] = ('ka')

    return _odata


if __name__ == '__main__':
    print('\n' + time.strftime('%c') + '\nGeneric_Parser.py\n')
    main()
    print('\n' + time.strftime('%c') + '\nNormal termination.')

实际上,该代码有效,但是例如会覆盖列。文件中给出了多个交易日期。因此,我需要一个自动在每个交易日期使用下一列的代码。这怎么工作? 如果有人能解决我的问题,我将感到非常高兴。非常感谢!

1 个答案:

答案 0 :(得分:0)

您的问题是您要迭代的结果 soup.find_all() 每次您写入相同的值时。你需要做点事 _odata在每次迭代中,否则您只会得到最后一次写入的内容。

如果您可以向我们展示您要解析的数据的实际外观,也许我们可以给出更具体的答案。