下面是我的代码。对于给定的单个网址,此代码可以正常工作。我想解析CSV中的网址。预先感谢。
P.S。我对Python很陌生。
下面的代码对于单个给定的URL可以正常工作
import requests
import pandas
from bs4 import BeautifulSoup
baseurl="https//www.xxxxxxxxx.com"
r=requests.get(baseurl)
c=r.content
soup=BeautifulSoup(c, "html.parser")
all=soup.find_all("div", {"class":"biz-us"})
for br in soup.find_all("br"):
br.replace_with("\n")
这是我尝试过的用于从CSV访问网址的代码
import csv
import requests
import pandas
from bs4 import BeautifulSoup
with open("input.csv", "rb") as f:
reader = csv.reader(f)
for row in reader:
url = row[0]
r=requests.get(url)
c=r.content
soup=BeautifulSoup(c, "html.parser")
all=soup.find_all("div", {"class":"biz-country-us"})
for br in soup.find_all("br"):
br.replace_with("\n")
答案 0 :(得分:0)
看起来您需要正确使用循环并获取url数组。试试看
import csv
import requests
import pandas
from bs4 import BeautifulSoup
df1 = pandas.read_csv("input.csv", skiprows=0) #assuming headers are in first row
urls = df1['url_column_name'].tolist() #get the urls in an array list
i=0
for i in range(len(urls)):
r=requests.get(urls[i])
c=r.content
soup=BeautifulSoup(c, "html.parser")
all=soup.find_all("div", {"class":"biz-country-us"})
for br in soup.find_all("br"):
br.replace_with("\n")
答案 1 :(得分:0)
假设您有一个名为linklists.csv
的csv文件,其中有一个头文件Links
。现在,您可以按照下面显示的方法使用标题Links
下的所有可用链接:
import csv
import requests
with open("linklists.csv") as infile:
reader = csv.DictReader(infile)
for link in reader:
res = requests.get(link['Links'])
print(res.url)