我想从包含网址列表的文件中删除重复的网址。它有" http://www.naver.com/나눔글꼴.jpg"他们正在重复..这是我的代码:
from bs4 import BeautifulSoup
import lxml
import re
import urllib.request
p = re.compile('나눔글꼴')
html = 'http://www.naver.com'
data = urllib.request.urlopen("http://www.naver.com").read()
soup = BeautifulSoup(data, 'lxml')
links = p.findall(str(soup))
i = set()
for i in links:
link = 'http://www.naver.com/' + str(i) + '.jpg'
print(link)
答案 0 :(得分:0)
您忘记为set()
方法提供输入:
soup = BeautifulSoup(data, 'lxml')
links = p.findall(str(soup))
i = set(links)
for x in i:
link = 'http://www.naver.com/' + str(x) + '.jpg'
print(link)