如何在手风琴Python中通过Web刮除隐藏的文本

时间:2018-12-16 08:48:16

标签: python selenium web-scraping

我写了一个简单的脚本,可以从澳大利亚赌博网站返回特定信息。

效果很好,但是在打开每个手风琴下拉菜单时我遇到很多麻烦。我的脚本如下。

from selenium import webdriver
import time

chrome_path =r"C:\Users\Tom\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://pointsbet.com.au/basketball/NBA")

time.sleep(2)

driver.find_element_by_xpath("""/html/body/div[1]/div[2]/sport-competition-component/div[1]/div[2]/div[1]/div/event-list/div[1]/event/div/header/div[1]/h2/a""").click()
time.sleep(2)


posts = driver.find_elements_by_class_name("market")
for post in posts:
    print(post.text)
    with open('output.xls',mode ='a') as f:
        f.write(post.text)
        f.write('\n')

driver.quit()

该脚本会打印类名称“市场”中包含的所有可见文本。

输出如下:

HEAD TO HEAD
Brooklyn Nets
1.29
Atlanta Hawks
3.78
LINE
Brooklyn Nets -8.0
1.95
Atlanta Hawks +8.0
1.89
TOTAL POINTS
Over 227.0
1.91
Under 227.0
1.91

我的问题是手风琴下有隐藏的文字。看截图: screenshot

-例如,我无法在“双重结果”标题下抓取数据

一旦“单击”脚本便可以正常工作。

我写了一些脚本来自动单击手风琴,但是不幸的是,每当匹配项时,xpath名称都会改变。

有人知道如何自动单击所有手风琴(不知道元素信息),还是有人有替代解决方案?

欢迎任何帮助,谢谢

更新:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time


chrome_path =r"C:\Users\Tom\Desktop\chromedriver.exe"

d = webdriver.Chrome(chrome_path)
d.get("https://pointsbet.com.au/basketball/NCAA-March-Madness")

time.sleep(2)

d.find_element_by_xpath("""/html/body/div[1]/div[2]/sport-competition-component/div[1]/div[2]/div[1]/div/event-list/div[1]/event/div/header/div[1]/h2/a""").click()
time.sleep(2)

expandable = WebDriverWait(d, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".h2.accordion-toggle.event-name")))
expandables = d.find_elements_by_css_selector('.h2.accordion-toggle.event-name')
for item in expandables:
    item.click()


posts = d.find_elements_by_class_name("market")
for post in posts:
    print(post.text)
    with open('output.xls',mode ='a') as f:
        f.write(post.text)
        f.write('\n')

d.quit()

错误:

Traceback (most recent call last):
  File "C:\Users\Tom\Desktop\Python test\points1 - Copy.py", line 21, in <module>
    item.click()
  File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable
  (Session info: chrome=73.0.3683.86)
  (Driver info: chromedriver=2.43.600210 (68dcf5eebde37173d4027fa8635e332711d2874a),platform=Windows NT 10.0.17134 x86_64)

1 个答案:

答案 0 :(得分:0)

您可以使用CSS类选择器来获取下拉列表的集合,并通过遍历集合来单击它们。示例页面:

<script type="text/javascript">
function myFunction() {
    var act= "//app.neolyze.com/public/"+document.getElementById("search").value;
    var name = document.getElementById("search").value;
    document.getElementById("form_id").action = act;
    // document.getElementById("form_id").submit();
    if (name == '') {
        alert("Please Fill All Fields");
    } else {
        // AJAX code to submit form.
        var ajaxData = {
            'action': 'add_query_db',
            'id_instagram': name
        }
        jQuery.ajax({
            type: "POST",
            url: '<?php echo admin_url('admin-ajax.php'); ?>',
            data: ajaxData,
            success: function( response ) {
                console.log("Data returned: " + response );
                $statusSelectCell.parent().css({"background-color": "#b3e6b3"});
                $statusSelectCell.parent().animate({backgroundColor: currentBackgroundColor}, 1200);

                // if you want to be redirected place submit function here
                document.getElementById("form_id").submit();
            },
            error: function() {
                alert("FAILED TO POST DATA!!");
            }

    });
    }
    return act;
}
</script>