打印输出双重结果

时间:2015-06-16 19:46:06

标签: python beautifulsoup

脚本正在打印双重结果,我无法确定问题。

# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup as bs

word = ("mission")

with requests.Session() as s:
    r = s.get('http://www.tabula.ge/en')
    soup = bs(r.text)
    div = soup.find("div", {"class": "sets"})           


    for i in div.find_all('li'):
        for text in i.find_all('a'):
            if word in text.encode('utf-8').strip():
                print text.get_text()

运行脚本后,我在打印输出中得到两次结果:

Kandelaki: Georgian UN mission yet to call security council meeting

Kandelaki: Georgian UN mission yet to call security council meeting

1 个答案:

答案 0 :(得分:2)

您搜索的内容在页面源中出现两次。

要查看来源:

  • 在浏览器中粘贴view-source:http://www.tabula.ge/en

  • 或者右键点击网页,然后选择"查看页面来源"

有两种情况发生:

<a href="/en/story/90354-kandelaki-georgian-un-mission-yet-to-call-security-council-meeting" data-topic="UN Security Council Meeting" data-video="false" data-date="December 1 2014, 03:13PM" data-comment-count="0" data-thumbnail="http://www.tabula.ge/files/styles/tab_thumb_featured/public/photos/2014/12/giorgi-kandelaki.jpg?itok=uKdw1i9k" data-nid="90354">
                         Kandelaki: Georgian UN mission yet to call security council meeting                    </a>