python美丽的汤在课后标签上得到中文

时间:2014-04-04 06:24:06

标签: python beautifulsoup

我在此类标签后立即收到中文文字时遇到问题。我查看了样本,这是迄今为止的代码。我不确定如何处理我的div变量。 print div给了我一个空白

from bs4 import BeautifulSoup
import requests

page = 'http://sbj.speiyou.com/search/index/subject:/grade:12/gtype:time'
r = requests.get(page)
r.encoding = 'utf-8'
soup = BeautifulSoup(r.text)

div = soup.findAll('div', {"class" : 'pagination mtop40'})
print div

我已经尝试print divprint div.text print div.string print div[0]

1 个答案:

答案 0 :(得分:2)

只有一个这样的标签;所以请使用soup.find(),而不是soup.findAll()

div = soup.find('div', class_='pagination')

包含文本元素和标签;要获取第一段文字,请使用.strings or .stripped_strings iterables;我更喜欢剥离的变体:

print next(div.stripped_strings, u'')

演示:

>>> from bs4 import BeautifulSoup
>>> import requests
>>> page = 'http://sbj.speiyou.com/search/index/subject:/grade:12/gtype:time'
>>> r = requests.get(page)
>>> soup = BeautifulSoup(r.text)
>>> div = soup.find('div', class_='pagination')
>>> div
<div class="pagination mtop40">
                     当前第1/17页 【 首页 】 <span style="color:red;font-weight:bold;">1</span>  <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:2">2</a> <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:3">3</a> <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:4">4</a> <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:5">5</a> <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:6">6</a> <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:7">7</a> <a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:8">8</a><a href="/search/index/grade:12/level:/subject:/gtype:time/service:/time:/term:/period:/o:da/bg:n/curpage:17">尾页</a> </div>

>>> print next(div.stripped_strings, u'')
当前第1/17页 【 首页 】

请注意,无需为该页面设置r.encoding;无论如何,服务器都会在Content-Type标题中为您提供编码。