Question

请，我希望您给我一些有关项目的方向，因为我迷路了，真的不知道从哪里开始。

我是Python的新手，但是我已经做了一个Web抓取脚本，使用lxml和xpath通过HTML DOM获取数据，从而从某些网站获取一些信息。

但是现在，客户向我提出了挑战...

该网站使用的框架我必须获取数据=（而且我不知道该如何处理...

更复杂的是，该站点需要登录：（

如果有人可以帮助我提供一些信息，例如我必须从哪里开始？

是否可以从网站上将数据显示为框架的数据？

这是网址：https://www.bulkshared.com/online-ordering

我想将脚本指向“ Pantry”部分，但网址未显示路径=（

您推荐我哪种脚本？我想使用Python，但是我必须使用BS吗？ Xpath？硒？

有人可以捐出一部分时间来帮助我吗？

非常感谢您的宝贵时间，伙计们！

Answer 1

{id: 9264, type: "db", title: "node-text", resource: "dascsvd", region: "xxxxx"}

注意：编写代码后，我发现import requests from bs4 import BeautifulSoup import re import csv def Login(url): with requests.Session() as req: r = req.get(url) soup = BeautifulSoup(r.content, 'html.parser') script = soup.find("script", type="text/javascript").text collectionId = re.search("collectionId\":\"(.*?)\"", script).group(1) metaSiteId = re.search("metaSiteId\":\"(.*?)\"", script).group(1) svSession = re.search("svSession\":\"(.*?)\"", script).group(1) data = { 'email': 'test@test.com', 'password': 'test123', 'collectionId': collectionId, 'metaSiteId': metaSiteId, 'appUrl': 'https://www.bulkshared.com/online-ordering', 'svSession': svSession } r = req.post( "https://www.bulkshared.com/_api/wix-sm-webapp/member/login", data=data) r = req.get( "https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json() return r def Sorter(): data = Login("https://www.bulkshared.com/") with open("result.csv", 'w', newline="", encoding="UTF-8") as f: writer = csv.writer(f) writer.writerow(["Name", "Price"]) for item in data["menu"]["items"]: title = item["title"]["en_AU"] try: price = item["price"] except: price = "N/A" try: description = item["description"]["en_AU"].strip() except: description = "N/A" writer.writerow([title, description, price]) Sorter()是完全公开的，不需要传递任何登录会话信息。

因此您可以直接调用它。

API

我需要从使用框架的网站上通过网络抓取数据

1 个答案: