Question

我有这个网站：https://www.adbc.gov.ae/BusinessActivityInfo/BusinessActivity.aspx?culture=en-US

此网站有2个下拉菜单：Category和SubCategory。选择Category和SubCategory后，它将显示一个表格，不同的Category和SubCategory将显示一个表格。如何为每个Category和SubCategory抓取该表。

这是我到目前为止尝试的：

url = 'https://www.adbc.gov.ae/BusinessActivityInfo/BusinessActivity.aspx?culture=en-US'

req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")

content = soup.find("select",{"name":"ddlNatureId"})
options = content.find_all("option")
options1 = [y.text for y in options]
options1

输出：

['',
 'ADVOCATE OFFICES',
 'AGENCIES',
 'AGRICULTURE',
 'AGRICULTURE, LIVESTOCK AND FISHERIES ACTIVITIES',
 'ANIMAL HUSBANDRY',
 'ANIMAL SHELTERING SERVICES',
 'ART GALLERY',
 'AUDITING OFFICES',
 'BAKERIES AND SWEETS',
...
]

更新：

这是我到目前为止所得到的。我发现使用Selenium选择下拉列表的值。这是我的代码：

一些库：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver.support.ui import Select
import time
import sys
from bs4 import BeautifulSoup
import requests

设置webdriver：

url = 'https://www.adbc.gov.ae/BusinessActivityInfo/BusinessActivity.aspx?culture=en-US'
chrome_driver_path = 'D:\\work\\crawl data\\selenium_project\\chromedriver.exe'

chrome_options = Options()
chrome_options.add_argument('--headless')

webdriver = webdriver.Chrome(
  executable_path=chrome_driver_path, options=chrome_options
)

加载网站并抓取数据代码：

with webdriver as driver:
    # Set timeout time 
    wait = WebDriverWait(driver, 10)

    # retrive url in headless browser
    driver.get(url)

    # find select box
    search = Select(driver.find_element_by_id("ddlNatureId"))
    search.select_by_value('ADVOCATE OFFICES')

    req = requests.get(url)
    soup = BeautifulSoup(req.text, "lxml")

    price=soup.find("select",{"name":"ddlSubCategId"})
    options = price.find_all("option")
    options1 = [y.text for y in options]

    driver.close()

print(options1)

输出：

[]

预期的输出（应该是SubCategory是Category的{{1}}的列表）：

'ADVOCATE OFFICES'

我现在的问题是，当我选择['', 'Advertising Agent', 'Advocate Offices', 'Agricultural Equipment And Tools Rental', 'Air Transport', 'Agents', ... ]时无法获取SubCategory的数据。我该如何解决这个问题？

从2下拉菜单中检索表格数据

0 个答案: