如何在BeautifulSoup Python中解析csv的网址

时间:2018-09-26 10:41:15

标签: python beautifulsoup

下面是我的代码。对于给定的单个网址,此代码可以正常工作。我想解析CSV中的网址。预先感谢。

P.S。我对Python很陌生。

下面的代码对于单个给定的URL可以正常工作

import requests
import pandas
from bs4 import BeautifulSoup

baseurl="https//www.xxxxxxxxx.com"

r=requests.get(baseurl)

c=r.content

soup=BeautifulSoup(c, "html.parser")

all=soup.find_all("div", {"class":"biz-us"})

for br in soup.find_all("br"):
    br.replace_with("\n")

这是我尝试过的用于从CSV访问网址的代码

import csv
import requests
import pandas
from bs4 import BeautifulSoup

with open("input.csv", "rb") as f:
    reader = csv.reader(f)

    for row in reader:
        url = row[0]

    r=requests.get(url)

    c=r.content

    soup=BeautifulSoup(c, "html.parser")

    all=soup.find_all("div", {"class":"biz-country-us"})

    for br in soup.find_all("br"):
        br.replace_with("\n")

2 个答案:

答案 0 :(得分:0)

看起来您需要正确使用循环并获取url数组。试试看

import csv
import requests
import pandas
from bs4 import BeautifulSoup

df1 = pandas.read_csv("input.csv", skiprows=0)  #assuming headers are in first row

urls = df1['url_column_name'].tolist()   #get the urls in an array list

i=0
for i in range(len(urls)):
    r=requests.get(urls[i])
    c=r.content
    soup=BeautifulSoup(c, "html.parser")

    all=soup.find_all("div", {"class":"biz-country-us"})

    for br in soup.find_all("br"):
        br.replace_with("\n")

答案 1 :(得分:0)

假设您有一个名为linklists.csv的csv文件,其中有一个头文件Links。现在,您可以按照下面显示的方法使用标题Links下的所有可用链接:

import csv
import requests

with open("linklists.csv") as infile:
    reader = csv.DictReader(infile)
    for link in reader:
        res = requests.get(link['Links'])
        print(res.url)