我正试图通过此代码从网站获取链接
import requests
from bs4 import BeautifulSoup
def get_links(max_pages):
page = 1
while page <= max_pages:
address = 'http://hamariweb.com/mobiles/nokia_mobile-phones1.aspx?Page=' + str(page)
source_code = requests.get(address)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class': 'TextClass8pt'}):
href = link.get("href")
print(href)
page += 1
get_links(3)
并且它给出了预期的输出。但是当我尝试这个时
address = 'http://propakistani.pk/category/cellular/page/' + str(page)
表示soup.findAll('a', {'class': 'aa_art_hdng'}):
显示此错误
TypeError:getresponse()得到了一个意外的关键字参数'buffering'
我也试过了另一个网站,但那个时间更新了它显示任何输出也没有任何错误。为什么它显示不同网站的正确输出?我的代码有问题吗?请帮我。 感谢
答案 0 :(得分:1)
此条件soup.findAll('a', {'class': 'TextClass8pt'})
尝试以下
<强>演示强>:
import requests
from bs4 import BeautifulSoup
def get_links(max_pages):
page = 1
while page <= max_pages:
address = 'http://propakistani.pk/category/cellular/page/' + str(page)
source_code = requests.get(address)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a'):
href = link.get("href")
print(href)
page += 1
get_links(3)
或强>
a
个标记的类值为aa_loop_h2a
例如
<a class="aa_loop_h2a" href="http://propakistani.pk/2015/04/20/mobile-data-usage-in-pakistan-grows-600-during-2014/" title="Mobile Data Usage in Pakistan Grows 600% During 2014">Mobile Data Usage in Pakistan Grows 600% During 2014</a>
请尝试使用soup.findAll('a', {'class': 'aa_loop_h2a'})
条件。