Pandas read_html()缺少列

时间:2016-12-30 10:13:46

标签: python html pandas

我正在使用以下read_html()调用来读取表格(在付费专区后面):

df = pd.read_html('http://markets.ft.com/data/equities/tearsheet/' + 
              'financials?s=BAG:LSE&subView=BalanceSheet&periodType=a')[0]

它解析得很好,除了缺少最后两列。我使用的是Anaconda的最新版本(Python 3.5,pandas 0.18.1,html5lib,BeautifulSoup4)。

输出的开头如下所示:

                Fiscal data as of Jan 30 2016  2016    2015    2014
                                      ASSETS   NaN     NaN     NaN
             Cash And Short Term Investments  6.80      25      13
                      Total Receivables, Net    50      49      45
                             Total Inventory    16      17      16

(太大而无法全部显示)

HTML的开头如下所示:

<table class="mod-ui-table">
            <thead>
                <tr>
                    <th class="mod-ui-table__header--text">Fiscal data as of Jan 30 2016</th>
                    <th>2016</th>
                    <th class="mod-ui-hide-xsmall">2015</th>
                    <th class="mod-ui-hide-xsmall">2014</th>
                    <th class="mod-ui-hide-xsmall">2013</th>
                    <th class="mod-ui-hide-xsmall">2012</th>
                </tr>
            </thead>
            <tr class="mod-ui-table__row--section-header">
                <th colspan="6">ASSETS</th>
            </tr>
            <tr class="mod-ui-table__row--striped">
                <th class="mod-ui-table__header--row-label">Cash And Short Term Investments</th>
                <td>6.80</td>
                <td class="mod-ui-hide-xsmall">25</td>
                <td class="mod-ui-hide-xsmall">13</td>
                <td class="mod-ui-hide-xsmall">0.91</td>
                <td class="mod-ui-hide-xsmall">8.29</td>
            </tr>
            <tr>
                <th class="mod-ui-table__header--row-label">Total Receivables, Net</th>
                <td>50</td>
                <td class="mod-ui-hide-xsmall">49</td>
                <td class="mod-ui-hide-xsmall">45</td>
                <td class="mod-ui-hide-xsmall">42</td>
                <td class="mod-ui-hide-xsmall">37</td>
            </tr>

HTML的结尾如下所示:

<tr class="mod-ui-table__row--highlight">
                    <th class="mod-ui-table__header--row-label">Total liabilities &amp; shareholders&#39; equity</th>
                    <td>269</td>
                    <td class="mod-ui-hide-xsmall">255</td>
                    <td class="mod-ui-hide-xsmall">227</td>
                    <td class="mod-ui-hide-xsmall">215</td>
                    <td class="mod-ui-hide-xsmall">196</td>
                </tr>
                <tr class="mod-ui-table__row--striped">
                    <th class="mod-ui-table__header--row-label">Total common shares outstanding</th>
                    <td>117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                </tr>
                <tr>
                    <th class="mod-ui-table__header--row-label">Treasury shares - common primary issue</th>
                    <td>0</td>
                    <td class="mod-ui-hide-xsmall">0</td>
                    <td class="mod-ui-hide-xsmall">0</td>
                    <td class="mod-ui-hide-xsmall">0</td>
                    <td class="mod-ui-hide-xsmall">--</td>
                </tr>
            </table>

如果不能立即明白什么可能是错的,我会感谢一些关于如何开始逐步执​​行read_html()代码以找到问题根源的提示。我现在是Python / pdb的新手。

1 个答案:

答案 0 :(得分:0)

事实证明,如果您没有登录FT网站,您只能获得三年的数据。

所以我现在正在研究如何登录FT网站(也许使用斜纹)。

有一个相关问题here