将XML转换为熊猫

时间:2020-01-25 17:27:50

标签: python dataframe finance

是否可以在不知道确切的列标题的情况下将XML文件(IB API的财务报表)转换为Pandas?这些行应反映不同的日期(每列有4个或更多数据点)。分别获得资产负债表,损益表和现金流量表也将很棒。我曾尝试使用漂亮的汤,但感到沮丧,因为似乎我需要专门查找每个列标题,而且我不知道如何获取每个日期的数据。

我正在尝试获取三个单独的数据框(每个财务报表一个)。抱歉,我不知道如何在此处添加表格,但它们应该看起来像这样。

DF1名称=损益表

|---------------------|------------------|------------------|
|      Date           |     SREV         |  VDES            |
|---------------------|------------------|------------------|
|   2018-09-29        | 265595.000000    | 12.208930        |
|---------------------|------------------|------------------|
|  .....              |   ......         | .....            |
|---------------------|------------------|------------------|

示例来自以下细分:

    <Statement Type="INC">
                        <FPHeader>
                            <PeriodLength>52</PeriodLength>
                            <periodType Code="W">Weeks</periodType>
                            <UpdateType Code="UPD">Updated Normal</UpdateType>
                            <AccountingStd/>
                            <StatementDate>2018-09-29</StatementDate>
                            <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                            <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                            <Source Date="2018-11-05">10-K</Source>
                        </FPHeader>
                        <lineItem coaCode="SREV">265595.000000</lineItem>
    (...)
                        <lineItem coaCode="VDES">12.208930</lineItem>

这是XML文件(由于字符限制,大约一半):

<?xml version="1.0" encoding="utf-8"?>
<ReportFinancialStatements Major="1" Minor="0" Revision="1">
    <CoIDs>
        <CoID Type="RepNo">05680</CoID>
        <CoID Type="CompanyName">Apple Inc.</CoID>
        <CoID Type="IRSNo">942404110</CoID>
        <CoID Type="CIKNo">0000320193</CoID>
    </CoIDs>
    <Issues>
        <Issue Desc="Common Stock" ID="1" Order="1" Type="C">
            <IssueID Type="Name">Ordinary Shares</IssueID>
            <IssueID Type="Ticker">AAPL</IssueID>
            <IssueID Type="RIC">AAPL.O</IssueID>
            <IssueID Type="DisplayRIC">AAPL.OQ</IssueID>
            <IssueID Type="InstrumentPI">331724</IssueID>
            <IssueID Type="QuotePI">7645713</IssueID>
            <Exchange Code="NASD" Country="USA">NASDAQ</Exchange>
            <MostRecentSplit Date="2014-06-09">7.0</MostRecentSplit>
        </Issue>
    </Issues>
    <CoGeneralInfo>
        <CoStatus Code="1">Active</CoStatus>
        <CoType Code="EQU">Equity Issue</CoType>
        <LastModified>2020-01-23</LastModified>
        <LatestAvailableAnnual>2019-09-28</LatestAvailableAnnual>
        <LatestAvailableInterim>2019-09-28</LatestAvailableInterim>
        <ReportingCurrency Code="USD">U.S. Dollars</ReportingCurrency>
        <MostRecentExchange Date="2020-01-22">1.0</MostRecentExchange>
    </CoGeneralInfo>
    <StatementInfo>
        <COAType Code="IND">Industry</COAType>
        <BalanceSheetDisplay Code="CUR">Differentiates</BalanceSheetDisplay>
        <CashFlowMethod Code="IND">Indirect</CashFlowMethod>
    </StatementInfo>
    <Notes>
        <CFAAvailability Code="1"/>
        <IAvailability Code="1"/>
        <ISIAvailability Code="1"/>
        <BSIAvailability Code="1"/>
        <CFIAvailability Code="1"/>
    </Notes>
    <FinancialStatements>
        <COAMap>
            <mapItem coaItem="SREV" lineID="100" precision="1" statementType="INC">Revenue</mapItem>
(...)
            <mapItem coaItem="SCTP" lineID="1050" precision="1" statementType="CAS">Cash Taxes Paid</mapItem>
        </COAMap>
        <AnnualPeriods>
            <FiscalPeriod EndDate="2019-09-28" FiscalYear="2019" Type="Annual">
                <Statement Type="INC">
                    <FPHeader>
                        <PeriodLength>52</PeriodLength>
                        <periodType Code="W">Weeks</periodType>
                        <UpdateType Code="UPD">Updated Normal</UpdateType>
                        <AccountingStd/>
                        <StatementDate>2019-09-28</StatementDate>
                        <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                        <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                        <Source Date="2019-10-31">10-K</Source>
                    </FPHeader>
                    <lineItem coaCode="SREV">260174.000000</lineItem>
(...)
                    <lineItem coaCode="VDES">11.885790</lineItem>
                </Statement>
                <Statement Type="BAL">
                    <FPHeader>
                        <UpdateType Code="UPD">Updated Normal</UpdateType>
                        <StatementDate>2019-09-28</StatementDate>
                        <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                        <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                        <Source Date="2019-10-31">10-K</Source>
                    </FPHeader>
                    <lineItem coaCode="ACSH">12204.000000</lineItem>
(...)
                    <lineItem coaCode="STBP">20.365340</lineItem>
                </Statement>
                <Statement Type="CAS">
                    <FPHeader>
                        <PeriodLength>52</PeriodLength>
                        <periodType Code="W">Weeks</periodType>
                        <UpdateType Code="UPD">Updated Normal</UpdateType>
                        <StatementDate>2019-09-28</StatementDate>
                        <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                        <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                        <Source Date="2019-10-31">10-K</Source>
                    </FPHeader>
                    <lineItem coaCode="ONET">55256.000000</lineItem>
(...)
                    <lineItem coaCode="SNCC">24311.000000</lineItem>
                </Statement>
            </FiscalPeriod>
            <FiscalPeriod EndDate="2018-09-29" FiscalYear="2018" Type="Annual">
                <Statement Type="INC">
                    <FPHeader>
                        <PeriodLength>52</PeriodLength>
                        <periodType Code="W">Weeks</periodType>
                        <UpdateType Code="UPD">Updated Normal</UpdateType>
                        <AccountingStd/>
                        <StatementDate>2018-09-29</StatementDate>
                        <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                        <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                        <Source Date="2018-11-05">10-K</Source>
                    </FPHeader>
                    <lineItem coaCode="SREV">265595.000000</lineItem>
(...)
                    <lineItem coaCode="VDES">12.208930</lineItem>
                </Statement>
                <Statement Type="BAL">
                    <FPHeader>
                        <UpdateType Code="CLA">Reclassified Normal</UpdateType>
                        <StatementDate>2018-12-29</StatementDate>
                        <Source Date="2019-01-30">10-Q</Source>
                    </FPHeader>
                    <lineItem coaCode="ACSH">11575.000000</lineItem>
(...)
                    <lineItem coaCode="STBP">22.533610</lineItem>
                </Statement>
                <Statement Type="CAS">
                    <FPHeader>
                        <PeriodLength>52</PeriodLength>
                        <periodType Code="W">Weeks</periodType>
                        <UpdateType Code="UPD">Updated Normal</UpdateType>
                        <StatementDate>2018-09-29</StatementDate>
                        <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                        <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                        <Source Date="2018-11-05">10-K</Source>
                    </FPHeader>
                    <lineItem coaCode="ONET">59531.000000</lineItem>
(...)
                    <lineItem coaCode="SNCC">5624.000000</lineItem>
                </Statement>
            </FiscalPeriod>
            <FiscalPeriod EndDate="2017-09-30" FiscalYear="2017" Type="Annual">
                <Statement Type="INC">
                    <FPHeader>
                        <PeriodLength>53</PeriodLength>
                        <periodType Code="W">Weeks</periodType>
                        <UpdateType Code="UPD">Updated Normal</UpdateType>
                        <AccountingStd/>
                        <StatementDate>2017-09-30</StatementDate>
                        <AuditorName Code="EY">Ernst &amp; Young LLP</AuditorName>
                        <AuditorOpinion Code="UNQ">Unqualified</AuditorOpinion>
                        <Source Date="2017-11-03">10-K</Source>
                    </FPHeader>
                    <lineItem coaCode="SREV">229234.000000</lineItem>
(...)
                    <lineItem coaCode="VDES">9.206750</lineItem>
                </Statement>
                <Statement Type="BAL">

1 个答案:

答案 0 :(得分:0)

我知道它并不漂亮,但是可以用:

from ib_insync import *
from bs4 import BeautifulSoup as bs
import pandas as pd

ib = IB()
ib.connect('127.0.0.1', 7497, clientId=1)


security = Stock('AAPL', 'SMART', 'USD')

# request the fundamentals
fundamentals = ib.reqFundamentalData(security, reportType='ReportsFinStatements')

soup = bs(fundamentals,'xml')

bal_l = []
inc_l = []
cas_l = []


for period in soup.find_all('FiscalPeriod'):
    if period.get('Type') != "Annual":
        for statement in period.find_all('Statement'):
            if statement.find('UpdateType').get('Code') != 'CLA':
                dic = {}


                t = statement.get('Type')
                d = statement.find('Source').get('Date')
                d1 = statement.find('StatementDate').text
                dic['date'] = d
                dic['StatementDate'] = d1


                for item in statement.find_all('lineItem'):
                    dic[item.get('coaCode')] =item.text


                if t == 'BAL':
                    bal_l.append(dic)
                    print(t, d, dic)
                elif t == 'INC':
                    inc_l.append(dic)
                elif t == 'CAS':
                    cas_l.append(dic)

balancesheet = pd.DataFrame(bal_l).sort_values('date')

with pd.option_context('display.max_rows', 1000, 'display.max_columns', None):
    print(balancesheet)