使用Beautiful Soup&amp ;;从CSV中抓取多个URL。蟒蛇

时间:2017-11-02 13:45:02

标签: python csv url beautifulsoup

我需要抓取存储在CSV文件中的网址列表。

我是Beautiful Soup的新手

1 个答案:

答案 0 :(得分:1)

假设您的urls.csv文件如下:

https://stackoverflow.com;code site;
https://steemit.com;block chain social site;

以下代码可以使用:

#!/usr/bin/python
# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup #required to parse html
import requests #required to make request

#read file
with open('urls.csv','r') as f:
    csv_raw_cont=f.read()

#split by line
split_csv=csv_raw_cont.split('\n')

#remove empty line
split_csv.remove('')

#specify separator
separator=";"

#iterate over each line
for each in split_csv:

    #specify the row index
    url_row_index=0 #in our csv example file the url is the first row so we set 0

    #get the url
    url = each.split(separator)[url_row_index] 

    #fetch content from server
    html=requests.get(url).content

    #soup fetched content
    soup=   BeautifulSoup(html)

    #show title from soup
    print soup.title.string

结果:

Stack Overflow - Where Developers Learn, Share, & Build Careers
Steemit

更多信息:beautifulsouprequests