我想在一个网站上解析一些URL,我创建了一个文本文件,其中包含我要解析的所有链接。如何在python程序上逐个从文本文件中调用此URL。
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(requests.get("https://www.example.com").content, "html.parser")
for d in soup.select("div[data-selenium=itemDetail]"):
url = d.select_one("h3[data-selenium] a")["href"]
upc = BeautifulSoup(requests.get(url).content, "html.parser").select_one("span.upcNum")
if upc:
data = json.loads(d["data-itemdata"])
text = (upc.text.strip())
print(upc.text)
outFile = open('/Users/Burak/Documents/new_urllist.txt', 'a')
outFile.write(str(data))
outFile.write(",")
outFile.write(str(text))
outFile.write("\n")
outFile.close()
urllist.txt中
https://www.example.com/category/1
category/2
category/3
category/4
提前致谢
答案 0 :(得分:0)
使用上下文管理器:
with open("/file/path") as f:
urls = [u.strip('\n') for u in f.readlines()]
您获取包含文件中所有网址的列表,然后可以根据需要调用它们。