如何通过Python使用ChromeDriver Chrome和Selenium在页面上打印类别链接?

时间:2019-03-21 12:25:10

标签: python selenium google-chrome selenium-chromedriver webdriverwait

使用Python3,我试图通过Chrome Webdriver和Selenium来识别www.jtinsight.com网页上的各种“分类”类别,然后从中打印出类别标题。到目前为止,使用下面我能做的最好的代码来打印出前两个-“所有类别”和“汽车(私人)”。我已经确定这两个的html与其他HTML不同,并尝试了许多不同的代码行,这些代码行我已在代码中注释掉,但无法识别正确的标记/类/ xpath等。 任何帮助将不胜感激。

from selenium import webdriver
from selenium.webdriver.common.by import By

# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()

# Directing the driver to the defined url
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")

# Locate the categories

# Each code line runs but only returns the first two categories
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6"]')
# categories = driver.find_elements_by_xpath('//div[@class="mainCatEntry"]')
# categories = driver.find_elements_by_xpath('//div[@class="Description"]')

# Process ran but finished with exit code 0
# categories = driver.find_elements_by_xpath('//*[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_partial_link_text('//href[@class="divLink"]')
# categories = driver.find_elements_by_tag_name('bindonce')
# categories = driver.find_elements_by_xpath('//div[@class="divLink"]')

# Error before finished running
# categories = driver.find_elements(By.CLASS_NAME, "col-md-3 col-sm-4 col-xs-6 ng-scope")
# categories = driver.find_elements(By.XPATH, '//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_class_name('//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')

# Print out all categories on current page
num_page_items = len(categories)
print(num_page_items)
for i in range(num_page_items):
    print(categories[i].text)

# Clean up (close browser once task is completed.)
driver.close()

2 个答案:

答案 0 :(得分:1)

这确实是一个计时问题。如果我在收集类别之前添加了“ sleep(5)”,它会找到全部24个。有趣的是,当我使用WebDriverWait时,它仍然只能提取2个项目。因此,为强制驱动程序执行更多工作,我扩展了xpath。以下对我有用:

: >>> Possible starvation in striped pool.
    Deadlock: false
    Completed: 1397
Thread [name="sys-stripe-7-#8%server.node%", id=22, state=WAITING, blockCnt=3, waitCnt=757]
    Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@b01791b, ownerName=sys-#214%server.node%, ownerId=248]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.awaitNext(FileWriteAheadLogManager.java:2871)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$2300(FileWriteAheadLogManager.java:2451)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.rollOver(FileWriteAheadLogManager.java:1205)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:836)
        at o.a.i.i.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4267)
        at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6333)
        at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6082)
        at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5782)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
        at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
        at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
        at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
        at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2295)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3242)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:135)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:309)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
        at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
ERROR --- [tcp-disco-msg-worker-#2%server.node%] [] o.a.i.i.u.t.G                            : Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=sys-stripe-1, blockedFor=10s]
WARN --- [tcp-disco-msg-worker-#2%server.node%] [] o.a.i.i.u.t.G                            : Thread [name="sys-stripe-1-#2%server.node%", id=16, state=WAITING, blockCnt=0, waitCnt=754]
    Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@b01791b, ownerName=sys-#214%server.node%, ownerId=248]

答案 1 :(得分:0)

要在网页https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main上标识各种 Classifieds 类别,并打印类别标题,例如所有类别汽车(私人)等,您需要向下滚动并诱导 WebDriverWait visibility_of_all_elements_located(),您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")
    driver.execute_script("arguments[0].scrollIntoView(true);",WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='ng-scope' and text()='Classifieds']"))));
    print([elem.get_attribute("innerHTML") for elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='mainCatEntry']//div[@class='Description']")))])