如何从有角度的网站中提取文本信息?

时间:2019-03-20 13:18:50

标签: python html angularjs selenium web-scraping

我正在尝试从此网站中提取某些文本字段,但对angular来说是新的。我正在使用硒来构建此Web抓取工具。我注意到确切的文本值未存储在html代码中。有人可以帮忙或提供一些提示来解决此问题。我尝试使用:

$data['type']

但是没有进展。谢谢:)

这是我尝试提取文本的一种方法:

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

但是我在终端

上收到此错误
def csc():
    alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
    indexOfAlpha = 0
    indexOfSheet = 2
    for x in range(2,4):
        y = x + 2
        driver.implicitly_wait(20)
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
        driver.implicitly_wait(20)
        worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
        ranSleep()
        indexOfSheet += 1

P.S很抱歉,我无法共享该网站,因为它需要私人登录。

Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.cssSelector("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
AttributeError: type object 'By' has no attribute 'cssSelector'
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable

Snippet of the text I want to extract with the html and angular code

Qharr的错误

这是我根据Qharr评论编写的代码

<input class="edited_field ng-pristine ng-untouched ng-valid ng-not-empty" type="text" ng-model="tab.content.site.name" ng-disabled="!tab.content.updateBtnPermission" disabled="disabled">
def csc():
    alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
    indexOfAlpha = 0
    indexOfSheet = 2
    for x in range(2,4):
        y = x + 2
        driver.implicitly_wait(20)
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
        driver.implicitly_wait(20)
        worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
        ranSleep()
        indexOfSheet += 1

1 个答案:

答案 0 :(得分:0)

当前错误抱怨复合类名。试试

driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))

您可能还需要一个等待条件,并且可能会缩短选择器以使用更少的类。