如何单击搜索公司名称按钮,键入公司名称并使用Selenium和Python搜索?

时间:2019-05-20 07:30:56

标签: python selenium selenium-webdriver search web-scraping

我只想在搜索按钮上搜索公司名称,键入公司名称,然后如果存在,我会尝试打开该公司的链接,但我尝试了很多方法,但总是会遇到问题。 .new抓取网页

我尝试过:

from selenium import webdriver

driver=webdriver.Chrome("C:/Users/RoshanB/Desktop/sentiment1/chromedriver_win32/chromedriver")

driver.get("http://www.careratings.com/brief-rationale.aspx")

但是现在我不知道如何单击“搜索公司名称”,键入公司名称并使用硒打开该公司链接

2 个答案:

答案 0 :(得分:0)

这应该对您有用,我已尝试使用id locator:

driver.maximize_window()

driver.get("http://www.careratings.com/brief-rationale.aspx")


search_company = driver.find_element_by_id("txtSearchCompany_brief")
search_company.send_keys("Abc India Limited:")

submit_button = driver.find_element_by_id("btn_submit").click()
submit_button.click()  

但是,如果公司名称不可用,则必须编写代码。

答案 1 :(得分:0)

尝试一下:

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from bs4.element import Tag

driver = webdriver.Chrome("C:/Users/RoshanB/Desktop/sentiment1/chromedriver_win32/chromedriver")
driver.get('http://www.careratings.com/brief-rationale.aspx')
time.sleep(4)

companyArray = []

try:
    search = driver.find_element_by_name('txtSearchCompany_brief')
    search.send_keys("Reliance Capital Limited")
    search.send_keys(Keys.RETURN)
    time.sleep(4)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    companies = soup.find("table",class_="table1")

    for tag in companies.findChildren():
        if isinstance(tag, Tag) and tag.name in 'a' and tag.has_attr('href'):
            url_string = "http://www.careratings.com/"+tag['href'].replace (" ","%20")
""" **Open pdf file in new tab browser** """
            open_pdf(url_string)
            companyArray.append(tag.text)

except Exception as e:
    print(e)

driver.quit()
print(companyArray)

O / P

公司列表:

['Reliance Capital Limited', 'Dewan Housing Finance Corporation Limited Ratings of various Securitisation transactions', 'Gamut Infosystems Limited', 'Henraajh Feeds India Private Limited', 'Ramdhan Spintex', 'Tripurashwari Agro Product Private Limited', 'Kalyaneswari Polyfabs Private Limited', 'Rakesh Kumar Gupta Rice Mills Private Limited', 'Sri Satnam Jewells Private Limited', 'Pitambara Foods', 'Sujata Udit Builders Private Limited', 'Kavita Industries', 'Krishna Industries', 'Pallavi Motors Private Limited', 'Anjani Cotgin', 'Sarala Foods Private Limited', 'B.M. Enterprises', 'Bihani Agro Foods Private Limited', 'M V Agrotech Private Limited', 'J.S.R & Company', 'ARG Royal Ensign Developers Private Limited', 'Ranergy Solutions Private Limited', 'RSI Switchgear Private Limited', 'Jyoti Chandrashekhar Bawankule', 'Sadguru Engineers & Allied Services Private Limited', 'R B Rungta Steels & Food Products Private Limited', 'V. N. Marketing', 'Aussee Oats India Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Dewan Housing Finance Corporation Limited', 'Pacific Medical University', 'S. K. Pradhan Construction Company Private Limited', 'Stadmed Private Limited', 'Namra Finance Limited', 'S Kumars Associates', 'R. R. and Company Private Limited']

如果要取消公司名称,则需要安装 BeautifulSoup 软件包

pip install beautifulsoup4==4.7.1

位置:

txtSearchCompany_brief 是输入的搜索名称

表1 是搜索结果表类

下载pdf文件链接:

Chromedriver, Selenium - Automate downloads

阅读pdf文件链接:

How to read line by line in pdf file using PyPdf?

在新标签页中打开pdf文件:

def open_pdf(url_string):
    driver1 = webdriver.Chrome("C:/Users/RoshanB/Desktop/sentiment1/chromedriver_win32/chromedriver")
    driver1.get(url_string)