我想要查看此页面的年度损益表,资产负债表和现金流量。 https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei
并将其放入数据框中。如您所见,您可以通过单击页面的不同部分来更改数据。有人能告诉我如何刮取年度损益表吗?这就是我到目前为止所拥有的。我可以看到汤中的数据,但我不知道如何去做。
from bs4 import BeautifulSoup
import requests
import pandas as pd
df =pd.DataFrame()
url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'
headers = {'User-Agent': 'Mozilla/5.0 (Windows; Windows NT 6.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5'}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
答案 0 :(得分:2)
为什么不使用pandas的read_html()
功能?因此,您将获得一个数据框列表(df
),每个表可以通过单击选项(其中包括年度损益表)显示一个数据框:
import pandas as pd
df = pd.read_html("https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei")
答案 1 :(得分:2)
pandas read_html应该是你想要的,但是既然你问了bs4如果你自己创建表,那么可以用bs4轻松完成:
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'
def get_rows(table):
# pull the data from each row, ignoring rows with no text
for row in table.select("tr"):
row = [x.text.strip() for x in row.select("td")]
if row:
yield row
def get_tables():
r = requests.get(url)
soup = BeautifulSoup(r.content)
# get all the tabes using the id selector
for table in soup.select("#fs-table"):
# create columns from th tags in the thead and get rows from helper
yield pd.DataFrame(list(get_rows(table)), columns=[x.text.strip() for x in table.find("thead").find_all("th")])
哪个会给你6个表,如果我们拨打next(get_tables())
,你可以看到第一个表:
In [4]: next(get_tables())
Out[4]:
In Millions of USD (except for per share items) \
0 Revenue
1 Other Revenue, Total
2 Total Revenue
3 Cost of Revenue, Total
4 Gross Profit
5 Selling/General/Admin. Expenses, Total
6 Research & Development
7 Depreciation/Amortization
8 Interest Expense(Income) - Net Operating
9 Unusual Expense (Income)
10 Other Operating Expenses, Total
11 Total Operating Expense
12 Operating Income
13 Interest Income(Expense), Net Non-Operating
14 Gain (Loss) on Sale of Assets
15 Other, Net
16 Income Before Tax
17 Income After Tax
18 Minority Interest
19 Equity In Affiliates
20 Net Income Before Extra. Items
21 Accounting Change
22 Discontinued Operations
23 Extraordinary Item
24 Net Income
25 Preferred Dividends
26 Income Available to Common Excl. Extra Items
27 Income Available to Common Incl. Extra Items
28 Basic Weighted Average Shares
29 Basic EPS Excluding Extraordinary Items
30 Basic EPS Including Extraordinary Items
31 Dilution Adjustment
32 Diluted Weighted Average Shares
33 Diluted EPS Excluding Extraordinary Items
34 Diluted EPS Including Extraordinary Items
35 Dividends per Share - Common Stock Primary Issue
36 Gross Dividends - Common Stock
37 Net Income after Stock Based Comp. Expense
38 Basic EPS after Stock Based Comp. Expense
39 Diluted EPS after Stock Based Comp. Expense
40 Depreciation, Supplemental
41 Total Special Items
42 Normalized Income Before Taxes
43 Effect of Special Items on Income Taxes
44 Income Taxes Ex. Impact of Special Items
45 Normalized Income After Taxes
46 Normalized Income Avail to Common
47 Basic Normalized EPS
48 Diluted Normalized EPS
3 months ending 2015-12-31 3 months ending 2015-09-30 \
0 22,059.00 19,280.00
1 - -
2 22,059.00 19,280.00
3 10,652.00 9,844.00
4 11,407.00 9,436.00
5 5,101.00 4,465.00
6 1,362.00 1,287.00
7 80.00 73.00
8 - -
9 12.00 112.00
10 -519.00 -32.00
11 16,962.00 15,659.00
12 5,097.00 3,621.00
13 - -
14 - -
15 2.00 -
16 5,099.00 3,621.00
17 4,461.00 2,962.00
18 - -
19 - -
20 4,461.00 2,962.00
21 - -
22 - -
23 - -
24 4,464.00 2,950.00
25 - -
26 4,460.00 2,962.00
27 4,463.00 2,950.00
28 - -
29 - -
30 - -
31 - 0.00
32 972.84 978.96
33 4.58 3.03
34 - -
35 1.30 1.30
36 - -
37 - -
38 - -
39 - -
40 - -
41 - -
42 - -
43 - -
44 - -
45 - -
46 - -
47 - -
48 4.87 2.98
3 months ending 2015-06-30 3 months ending 2015-03-31 \
0 20,813.00 19,590.00
1 - -
2 20,813.00 19,590.00
3 10,423.00 10,138.00
4 10,390.00 9,452.00
5 4,923.00 5,022.00
6 1,300.00 1,298.00
7 72.00 79.00
8 - -
9 178.00 285.00
10 -190.00 -20.00
11 16,716.00 16,762.00
12 4,097.00 2,828.00
13 - -
14 - -
15 127.00 173.00
16 4,224.00 3,001.00
17 3,526.00 2,416.00
18 - -
19 - -
20 3,526.00 2,416.00
21 - -
22 - -
23 - -
24 3,449.00 2,328.00
25 - -
26 3,526.00 2,416.00
27 3,449.00 2,328.00
28 - -
29 - -
30 - -
31 - -
32 986.70 992.30
33 3.57 2.43
34 - -
35 1.30 1.10
36 - -
37 - -
38 - -
39 - -
40 - -
41 - -
42 - -
43 - -
44 - -
45 - -
46 - -
47 - -
48 3.56 2.52
3 months ending 2014-12-31
0 24,113.00
1 -
2 24,113.00
3 11,251.00
4 12,862.00
5 5,375.00
6 1,320.00
7 93.00
8 -
9 578.00
10 -317.00
11 17,018.00
12 7,095.00
13 -
14 -
15 -
16 7,095.00
17 5,516.00
18 -
19 -
20 5,516.00
21 -
22 -
23 -
24 5,485.00
25 -
26 5,514.00
27 5,483.00
28 -
29 -
30 -
31 -
32 995.30
33 5.54
34 -
35 1.10
36 -
37 -
38 -
39 -
40 -
41 -
42 -
43 -
44 -
45 -
46 -
47 -
48 6.65
或者使用带有xpaths的lxml:
url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'
from lxml.etree import fromstring, HTMLParser
import pandas as pd
def get_rows(table):
for row in table.xpath(".//tr"):
row = row.xpath("./td/text()")
if row:
yield row
def get_tables():
r = requests.get(url)
xml = fromstring(r.content, HTMLParser())
for table in xml.xpath("//table[@id='fs-table']"):
yield pd.DataFrame(list(get_rows(table)), columns=[x.strip() for x in table.xpath(".//th/text()")])