如何从python中的动态下拉列表中提取/抓取选项值?

时间:2020-10-18 22:31:44

标签: python selenium web-scraping webdriver

我正在尝试从网页中提取数据,该网页根据我们的输入动态加载下拉列表中的选项。我正在使用Selenium Webdriver从下拉列表中提取数据。请查看下面的屏幕截图。

Dropdown 1 - State

Dropdown 2 - City

Dropdown 3 - Station

选择状态后将加载“城市下拉菜单”选项,选择城市后将加载“车站”下拉菜单。

到目前为止,我已经能够使用此代码提取站点名称。

myView.layer.zPosition = -1

State Options

City Options

Option values from station dropdown

有人可以帮我提取每个州和城市的siteId吗?

1 个答案:

答案 0 :(得分:0)

使用python尝试以下方法- requests 请求时,需要简单,直接,可靠,快速且更少的代码。检查Google chrome浏览器的网络部分后,我已经从网站本身获取了API URL。

下面的脚本到底在做什么:

  1. 首先,它将使用API​​ URL和有效负载(对执行POST请求非常重要)来进行POST请求并获取返回的数据。
  2. 获取数据脚本后,将使用json.loads库解析JSON数据。
  3. 最后,它将逐个遍历站点列表,并打印详细信息,例如州名称,城市名称,站点名称和站点站点ID。

“网络通话”标签 enter image description here

以下代码的输出。

Output of python script

def scrap_aqi_site_id():
URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
result = json.loads(response.text) # parse the JSON object using json library
extracted_states = result['stations'] 
for state in range(len(extracted_states)): # loop over extracted states and its stations data.
    print('=' * 120)
    print('Scrapping station data for state : ' + extracted_states[state]['stateID'])
    for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
        print('-' * 100)
        print('Scrapping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
        print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
        print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
        print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
        print('-' * 100)        
    print('Scrapping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
    print('=' * 120)

scrap_aqi_site_id()