删除重复网址的python beautifulsoup

时间:2016-01-27 02:02:00

标签: beautifulsoup

我想从包含网址列表的文件中删除重复的网址。它有" http://www.naver.com/나눔글꼴.jpg"他们正在重复..这是我的代码:

from bs4 import BeautifulSoup
import lxml
import re
import urllib.request

p = re.compile('나눔글꼴')
html = 'http://www.naver.com'
data = urllib.request.urlopen("http://www.naver.com").read()

soup = BeautifulSoup(data, 'lxml')
links = p.findall(str(soup))

i = set() 
for i in links:
    link = 'http://www.naver.com/' + str(i) + '.jpg'         
    print(link)

1 个答案:

答案 0 :(得分:0)

您忘记为set()方法提供输入:

soup = BeautifulSoup(data, 'lxml')
links = p.findall(str(soup))

i = set(links) 
for x in i:
    link = 'http://www.naver.com/' + str(x) + '.jpg'         
    print(link)