从表格中抓取数据

时间:2016-03-08 09:15:50

标签: python web-scraping

我想要查看此页面的年度损益表,资产负债表和现金流量。 https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei并将其放入数据框中。如您所见,您可以通过单击页面的不同部分来更改数据。有人能告诉我如何刮取年度损益表吗?这就是我到目前为止所拥有的。我可以看到汤中的数据,但我不知道如何去做。

from bs4 import BeautifulSoup
import requests
import pandas as pd

df =pd.DataFrame()
url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'
headers = {'User-Agent': 'Mozilla/5.0 (Windows; Windows NT 6.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5'}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')

2 个答案:

答案 0 :(得分:2)

为什么不使用pandas的read_html()功能?因此,您将获得一个数据框列表(df),每个表可以通过单击选项(其中包括年度损益表)显示一个数据框:

import pandas as pd
df = pd.read_html("https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei")

答案 1 :(得分:2)

pandas read_html应该是你想要的,但是既然你问了bs4如果你自己创建表,那么可以用bs4轻松完成:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'


def get_rows(table): 
    # pull the data from each row, ignoring rows with no text
    for row in table.select("tr"):
        row = [x.text.strip() for x in row.select("td")]
        if row:
            yield row


def get_tables():
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    # get all the tabes using the id selector
    for table in soup.select("#fs-table"):
        # create columns from th tags in the thead and get rows from helper
        yield pd.DataFrame(list(get_rows(table)), columns=[x.text.strip() for x in table.find("thead").find_all("th")])

哪个会给你6个表,如果我们拨打next(get_tables()),你可以看到第一个表:

In [4]: next(get_tables())
Out[4]: 
     In Millions of USD (except for per share items)  \
0                                            Revenue   
1                               Other Revenue, Total   
2                                      Total Revenue   
3                             Cost of Revenue, Total   
4                                       Gross Profit   
5             Selling/General/Admin. Expenses, Total   
6                             Research & Development   
7                          Depreciation/Amortization   
8           Interest Expense(Income) - Net Operating   
9                           Unusual Expense (Income)   
10                   Other Operating Expenses, Total   
11                           Total Operating Expense   
12                                  Operating Income   
13       Interest Income(Expense), Net Non-Operating   
14                     Gain (Loss) on Sale of Assets   
15                                        Other, Net   
16                                 Income Before Tax   
17                                  Income After Tax   
18                                 Minority Interest   
19                              Equity In Affiliates   
20                    Net Income Before Extra. Items   
21                                 Accounting Change   
22                           Discontinued Operations   
23                                Extraordinary Item   
24                                        Net Income   
25                               Preferred Dividends   
26      Income Available to Common Excl. Extra Items   
27      Income Available to Common Incl. Extra Items   
28                     Basic Weighted Average Shares   
29           Basic EPS Excluding Extraordinary Items   
30           Basic EPS Including Extraordinary Items   
31                               Dilution Adjustment   
32                   Diluted Weighted Average Shares   
33         Diluted EPS Excluding Extraordinary Items   
34         Diluted EPS Including Extraordinary Items   
35  Dividends per Share - Common Stock Primary Issue   
36                    Gross Dividends - Common Stock   
37        Net Income after Stock Based Comp. Expense   
38         Basic EPS after Stock Based Comp. Expense   
39       Diluted EPS after Stock Based Comp. Expense   
40                        Depreciation, Supplemental   
41                               Total Special Items   
42                    Normalized Income Before Taxes   
43           Effect of Special Items on Income Taxes   
44          Income Taxes Ex. Impact of Special Items   
45                     Normalized Income After Taxes   
46                 Normalized Income Avail to Common   
47                              Basic Normalized EPS   
48                            Diluted Normalized EPS   

   3 months ending 2015-12-31 3 months ending 2015-09-30  \
0                   22,059.00                  19,280.00   
1                           -                          -   
2                   22,059.00                  19,280.00   
3                   10,652.00                   9,844.00   
4                   11,407.00                   9,436.00   
5                    5,101.00                   4,465.00   
6                    1,362.00                   1,287.00   
7                       80.00                      73.00   
8                           -                          -   
9                       12.00                     112.00   
10                    -519.00                     -32.00   
11                  16,962.00                  15,659.00   
12                   5,097.00                   3,621.00   
13                          -                          -   
14                          -                          -   
15                       2.00                          -   
16                   5,099.00                   3,621.00   
17                   4,461.00                   2,962.00   
18                          -                          -   
19                          -                          -   
20                   4,461.00                   2,962.00   
21                          -                          -   
22                          -                          -   
23                          -                          -   
24                   4,464.00                   2,950.00   
25                          -                          -   
26                   4,460.00                   2,962.00   
27                   4,463.00                   2,950.00   
28                          -                          -   
29                          -                          -   
30                          -                          -   
31                          -                       0.00   
32                     972.84                     978.96   
33                       4.58                       3.03   
34                          -                          -   
35                       1.30                       1.30   
36                          -                          -   
37                          -                          -   
38                          -                          -   
39                          -                          -   
40                          -                          -   
41                          -                          -   
42                          -                          -   
43                          -                          -   
44                          -                          -   
45                          -                          -   
46                          -                          -   
47                          -                          -   
48                       4.87                       2.98   

   3 months ending 2015-06-30 3 months ending 2015-03-31  \
0                   20,813.00                  19,590.00   
1                           -                          -   
2                   20,813.00                  19,590.00   
3                   10,423.00                  10,138.00   
4                   10,390.00                   9,452.00   
5                    4,923.00                   5,022.00   
6                    1,300.00                   1,298.00   
7                       72.00                      79.00   
8                           -                          -   
9                      178.00                     285.00   
10                    -190.00                     -20.00   
11                  16,716.00                  16,762.00   
12                   4,097.00                   2,828.00   
13                          -                          -   
14                          -                          -   
15                     127.00                     173.00   
16                   4,224.00                   3,001.00   
17                   3,526.00                   2,416.00   
18                          -                          -   
19                          -                          -   
20                   3,526.00                   2,416.00   
21                          -                          -   
22                          -                          -   
23                          -                          -   
24                   3,449.00                   2,328.00   
25                          -                          -   
26                   3,526.00                   2,416.00   
27                   3,449.00                   2,328.00   
28                          -                          -   
29                          -                          -   
30                          -                          -   
31                          -                          -   
32                     986.70                     992.30   
33                       3.57                       2.43   
34                          -                          -   
35                       1.30                       1.10   
36                          -                          -   
37                          -                          -   
38                          -                          -   
39                          -                          -   
40                          -                          -   
41                          -                          -   
42                          -                          -   
43                          -                          -   
44                          -                          -   
45                          -                          -   
46                          -                          -   
47                          -                          -   
48                       3.56                       2.52   

   3 months ending 2014-12-31  
0                   24,113.00  
1                           -  
2                   24,113.00  
3                   11,251.00  
4                   12,862.00  
5                    5,375.00  
6                    1,320.00  
7                       93.00  
8                           -  
9                      578.00  
10                    -317.00  
11                  17,018.00  
12                   7,095.00  
13                          -  
14                          -  
15                          -  
16                   7,095.00  
17                   5,516.00  
18                          -  
19                          -  
20                   5,516.00  
21                          -  
22                          -  
23                          -  
24                   5,485.00  
25                          -  
26                   5,514.00  
27                   5,483.00  
28                          -  
29                          -  
30                          -  
31                          -  
32                     995.30  
33                       5.54  
34                          -  
35                       1.10  
36                          -  
37                          -  
38                          -  
39                          -  
40                          -  
41                          -  
42                          -  
43                          -  
44                          -  
45                          -  
46                          -  
47                          -  
48                       6.65  

或者使用带有xpaths的lxml

 url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'

from lxml.etree import fromstring, HTMLParser
import pandas as pd


def get_rows(table):
    for row in table.xpath(".//tr"):
        row = row.xpath("./td/text()")
        if row:
            yield row


def get_tables():
    r = requests.get(url)
    xml = fromstring(r.content, HTMLParser())
    for table in xml.xpath("//table[@id='fs-table']"):
        yield pd.DataFrame(list(get_rows(table)), columns=[x.strip() for x in table.xpath(".//th/text()")])