find_element_by_path只有可见元素<class> <div id =“xxxx”> </div> </class>

时间:2014-03-11 19:00:39

标签: selenium xpath selenium-webdriver web-scraping selenium-chromedriver

我尝试只扫描可见元素但是

# ============================================================
#import codecs
#import requests
#import html5lib
#import string
import lxml.html as lh
from lxml import etree
import urllib
import urllib2
import os
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from bs4 import BeautifulSoup
from pandas import *
import re
from datetime import datetime
from dateutil import parser
import time
import os
import inspect
import itertools

chromedriver = "chromedriver_win32.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(chromedriver)
URL = 'http://odds.7m.hk/en/default.shtml?t=3&dt=2011-08-13'
browser.get(URL)

#expend the wrapped/collapsed event list which includes leagues
browser.find_element_by_xpath('//*[@id="hlistMatch"]').click()

#only omit the checkbox ENG Premier League id @value='92'
checkboxes = browser.find_elements_by_xpath('//input[@name="c_league" and not(@value="92") and @checked="checked"]')
for checkbox in checkboxes:
    if checkbox.is_selected():
        checkbox.click()
browser.find_element_by_xpath('//*[@id="league_input"]/span[1]/a').click()

browser.find_elements_by_xpath('//input[@id="bh473558"]/div')
Out[70]: []

为什么正常的find_element_by_xpath找不到[]? 我想要只获得可见的id元素。在这里,我通过以下链接附上我的截图。有些人遮住了我的光吗?

My question --- visible and invisible elements

Need xpath locators for visible elements

2 个答案:

答案 0 :(得分:0)

您可以随时过滤掉不可见的元素:

var ele = Driver.FindElementsByXpath("xpath");
var visibleEle;
visibleEle.AddRange(ele.Where(t => t.Displayed));

答案 1 :(得分:0)

我明白了,它的工作......然而代码太长了。

lnk = soup.findAll('a', attrs={'class':['team_ls','lot_icon0'],
             'href':re.compile('http://data.7m.cn/matches_data/92/en/index.shtml|http://data.7m.cn/analyse/en/')})
EngPR = soup.findAll('a', href=re.compile('http://data.7m.cn/matches_data/92/en/index.shtml'))

matchID = []
df = lnk
for i in range(len(lnk)):
    if EngPR[0]['href'] == lnk[i]['href']:
        # re.findall(r'.*?([0-9]+)', dflist[0])
        # Out[162]: ['7', '473558']
        # [-1] to delete the 1st matched digit which is http://data.'7'm.cn
        df = re.findall(r'.*?([0-9]+)', lnk[i+1]['href'])[-1]
        matchID.append(df)
del lnk; del EngPR; del df; del i