我正在尝试从页面上抓取一些数据,只有按照以下三个步骤进行操作:单击文本以显示复选框,选中一个复选框,然后单击按钮将我带到下一页,从这一点开始,我将抓取数据。我正在使用Python的Selenium程序包进行三次单击,然后使用driver.page_source发送到BeautifulSoup来刮取数据。
网页在这里:https://www.betonline.ag/sportsbook。在左侧,您可以单击运动列表,以显示复选框。这似乎是Selenium无法做到的。我在HTML中找不到可以单击的任何元素。如果我手动完成此步骤,那么脚本的其余部分效果很好。使xpath选中框有点棘手,但是使用bs4和这个awesome function by ergoithz xpath_soup我可以完成步骤2。然后其余的工作就很容易了。
问题 如何使用Selenium完成第一步以单击“棒球”或“ +”以显示复选框列表?
下面提供的是
这些屏幕截图更具体地安排了流程:
import pandas as pd
import bs4
from selenium import webdriver
def xpath_soup(element):
##### Awesome function by ergoithz (removed docstring to save space)
#https://gist.github.com/ergoithz/6cf043e3fdedd1b94fcf
components = []
child = element if element.name else element.parent
for parent in child.parents: # type: bs4.element.Tag
siblings = parent.find_all(child.name, recursive=False)
components.append(
child.name if 1 == len(siblings) else '%s[%d]' % (
child.name,
next(i for i, s in enumerate(siblings, 1) if s is child)
)
)
child = parent
components.reverse()
return '/%s' % '/'.join(components)
# Call up driver and
driver = webdriver.Chrome()
driver.get('https://www.betonline.ag/sportsbook')
soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
divs = soup.find_all('div', class_='mainSportsLinks')
# Get 'south korea baseball' index (inelegant but it works)
table = divs[0].find_all('a')
sk_indices = []
for index, row in enumerate(table):
try:
#if bool(re.match(reg, row['cfg'])):
if 'South Korea' in row['cfg']:
sk_indices.append(index)
print(index, row)
except:
pass
sk_index = sk_indices[0]
# use awesome xpath_soup function to create an xpath to find checkbox
# function by ergoithz @ https://gist.github.com/ergoithz/6cf043e3fdedd1b94fcf
bs4_tag = table[sk_index].parent.parent.find_all('div')[1].input
xpath_text = xpath_soup(bs4_tag)
### -----------------------------
### Missing Step 1 - how to click "Baseball" to expose the checkbox
### -----------------------------
# Step 2 - Click checkbox
button = driver.find_element_by_xpath(xpath_text)
#print(button.get_attribute("type"))
button.click()
# Step 3 - Click 'View Selected'
view_selected = driver.find_element_by_id('viewSelectedId')
view_selected.click()
# Pass to bs4 for scraping
page_source = driver.page_source
soup_bets = bs4.BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
# Scrape using pandas read_html ...
#df_raw = pd.read_html(page_source, match='South Korea KBO')[0]
#df = df_raw.dropna(thresh=3).dropna(thresh=3, axis=1)
#df = df.loc[1:] # eliminate blank first column
如果尝试“跳过”步骤1,则执行堆栈跟踪。除非公开列表,否则不允许我与复选框进行交互:
---------------------------------------------------------------------------
ElementNotInteractableException Traceback (most recent call last)
<ipython-input-7-3ddcc15ccda8> in <module>
5 button = driver.find_element_by_xpath(xpath_text)
6 #print(button.get_attribute("type"))
----> 7 button.click()
8
9 # Step 3 - Click 'View Selected'
C:\python38\lib\site-packages\selenium\webdriver\remote\webelement.py in click(self)
78 def click(self):
79 """Clicks the element."""
---> 80 self._execute(Command.CLICK_ELEMENT)
81
82 def submit(self):
C:\python38\lib\site-packages\selenium\webdriver\remote\webelement.py in _execute(self, command, params)
631 params = {}
632 params['id'] = self._id
--> 633 return self._parent.execute(command, params)
634
635 def find_element(self, by=By.ID, value=None):
C:\python38\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))
C:\python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):
ElementNotInteractableException: Message: element not interactable
(Session info: chrome=81.0.4044.129)
答案 0 :(得分:0)
您可以在xPath下面使用它来单击篮子球
driver.find_element_by_xpath('//a[@cfg="{type:\'h2h\',level1:\'Basketball\'}"]').click()