Question

我有这个简单的单行脚本：

from pandas import read_html

print read_html('http://money.cnn.com/data/hotstocks/', flavor = 'bs4')

哪个有效，但是列名缺失，它们被识别为1,2,3。是否有一种简单的方法可以告诉pandas使用第一行作为列名？我知道我可以将名称存储为列表并设置它们，然后跳过第一行，但我想知道是否有更简单/更好的方法。

目前正在打印：

                           0       1       2         3
0                    Company   Price  Change  % Change
1             AAPL Apple Inc  115.31   +6.17    +5.65%
2   BAC Bank of America Corp   15.20   -0.43    -2.75%
3            YHOO Yahoo! Inc   46.46   -1.53    -3.19%
4        MSFT Microsoft Corp   41.19   -1.47    -3.45%
5            FB Facebook Inc   76.24   +0.46    +0.61%
6     GE General Electric Co   23.84   -0.54    -2.21%
7                 T AT&T Inc   32.68   -0.13    -0.40%
8            F Ford Motor Co   14.46   -0.24    -1.63%
9            INTC Intel Corp   33.78   -0.41    -1.20%
10    CSCO Cisco Systems Inc   26.80   -0.09    -0.35%

Answer 1

'read_html`采用标头参数。您可以传递行索引：

read_html('http://money.cnn.com/data/hotstocks/', header =0, flavor = 'bs4')

值得注意的是文档中的这个警告：

例如，如果在传递header = 0参数时将列名转换为NaN，则可能需要手动分配列名

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html

使用第一行作为列名？熊猫read_html

1 个答案: