脚本正在打印双重结果,我无法确定问题。
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup as bs
word = ("mission")
with requests.Session() as s:
r = s.get('http://www.tabula.ge/en')
soup = bs(r.text)
div = soup.find("div", {"class": "sets"})
for i in div.find_all('li'):
for text in i.find_all('a'):
if word in text.encode('utf-8').strip():
print text.get_text()
运行脚本后,我在打印输出中得到两次结果:
Kandelaki: Georgian UN mission yet to call security council meeting
Kandelaki: Georgian UN mission yet to call security council meeting
答案 0 :(得分:2)
您搜索的内容在页面源中出现两次。
要查看来源:
在浏览器中粘贴view-source:http://www.tabula.ge/en
或者右键点击网页,然后选择"查看页面来源"
有两种情况发生:
<a href="/en/story/90354-kandelaki-georgian-un-mission-yet-to-call-security-council-meeting" data-topic="UN Security Council Meeting" data-video="false" data-date="December 1 2014, 03:13PM" data-comment-count="0" data-thumbnail="http://www.tabula.ge/files/styles/tab_thumb_featured/public/photos/2014/12/giorgi-kandelaki.jpg?itok=uKdw1i9k" data-nid="90354">
Kandelaki: Georgian UN mission yet to call security council meeting </a>