Question

我知道这里有很多这样的问题。但是我无法用那些QnA解决我的问题。

请先查看解释我情况的[图片]。 ==＆GT; http://i.stack.imgur.com/cCeY2.jpg

我的最终目标是获取红色矩形-1的值。所以我尝试用红色矩形2中的标记读取红色矩形-3和美丽的汤。但我不能。实际上我无法在红色矩形-2中获得任何标记。

以下是我的代码。有什么问题？我错过了什么？

＃

from bs4 import BeautifulSoup 
from urllib.request import urlopen

res = urlopen('http://finance.naver.com/item/main.nhn?code=005490')
soup = BeautifulSoup(res, 'html.parser')

#
# I can't find 'table' tag in red rectangle-2
tag0 = soup.find_all('table', {'class':'gHead03'})

i = 0
while i < tag0.__len__():
    print(tag0[i])
    i = i + 1
    print('\n', i)


# But I can find 'ul'tag out of red rectangle-2
tag1 = soup.find_all('ul', {'class':'tabs_submenu tab_total_submenu'})

i = 0
while i < tag1.__len__():
    print(tag1[i])
    i = i + 1
    print('\n', i)
######################################################################

Answer 1

如果您查看根据图片使用正确的网址返回的来源，您将看不到该表，因为使用不同的网址检索html，您可以在chrome dev工具的doc标签下看到：< / p>

所以你需要做的就是传递一个代码：

from bs4 import BeautifulSoup

params = {"cmp_cd":"005490"}
res = requests.get("http://companyinfo.stock.naver.com/v1/company/c1010001.aspx", params=params)
soup = BeautifulSoup(res.content, 'html.parser')
tag0 = soup.find('table', {'class':'gHead03'})

print(tag0)

soup.find_all('ul', {'class':'tabs_submenu tab_total_submenu'})的唯一原因是因为页面上还有其他 ul.tabs_submenu tab_total_submenu 标记不在 ghead03 表中，您实际上并不是从桌子上拉出任何东西。

如果我们运行代码，你可以看到我们得到了表：

In [2]: from bs4 import BeautifulSoup

In [3]: import requests

In [4]: params = {"cmp_cd":"005490"}

In [5]: res = requests.get("http://companyinfo.stock.naver.com/v1/company/c1010001.aspx", params=params)

In [6]: soup = BeautifulSoup(res.content, 'html.parser')

In [7]: tag0 = soup.find('table', {'class':'gHead03'})

In [8]: print(tag0)
<table class="gHead03" ...............

我无法发布完整的输出，因为中文字符被标记为垃圾邮件，但它就在那里。

为什么我无法使用BeautifulSoup获取标签

1 个答案: