python解析URL列表中的字符

时间:2018-02-10 21:03:37

标签: python parsing

我是python的新手并使用我已经删除的URL列表。部分网址在其前面带有#符号返回,如何删除?

这是我的代码:

from bs4 import BeautifulSoup, SoupStrainer
import requests

source = requests.get('https://www.census.gov/programs-surveys/popest.html').text

soup = BeautifulSoup(source, 'html.parser')

links = soup.find_all('a', href=True)


records = []
for results in links:
    url = results['href']
    records.append(url)

#here i am removing the duplicate URL's from the records list
records = set(records)
records = list(records)

#here i am returning URL's only containing 'http'
filter_records = [k for k in records if 'http' in k]

import csv
with open ('test.csv', 'w') as f:
    csv_writer = csv.writer(f, delimiter=',')
    csv_writer.writerow(['Web Address'])
    [csv_writer.writerow([record]) for record in filter_records]

如何从列表中的某些结果中删除#

0 个答案:

没有答案