我想找一个id为'thread_list'的ul
标签。如果我将标题作为参数添加到requests.get()
,则无法找到标记。但是,如果我遗漏标题,那么它就可以了。
这是我的代码:
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
url='http://tieba.baidu.com/f?kw=acu&ie=utf-8&pn=0'
这个没有用(标题和ul是无):
resWithHeaders=requests.get(url,headers = headers)
with open('soup1.txt','w')as f:
f.write(resWithHeaders.text.encode('utf-8'))
soup1=BeautifulSoup(resWithHeaders.text, 'html.parser')
ul=soup1.find('ul',{'id':'thread_list'})
print ul
这个工作(没有标题):
resWithoutHeaders=requests.get(url)
with open('soup2.txt','w')as f:
f.write( resWithoutHeaders.text.encode('utf-8'))
soup2=BeautifulSoup( resWithoutHeaders.text, 'html.parser')
ul=soup2.find('ul',{'id':'thread_list'})
print ul