Question

我的目标是从弹出窗口中提取页面内容。以前我只是使用PhantomJS进行动态抓取，它适用于大多数网站，但很少有网站使用AngularJS进行弹出窗口。在这种情况下，动态抓取不起作用，并且所有这些都在爆发标签中进行了ng-click。

我首先尝试找出ng-click标签，然后在执行一些过滤后，我得到了包含ng-label的标签。我使用该标签和ng-click值来使用css选择器查找元素，这对我来说是最好的选择，因为我需要在大范围内自动执行此操作。

我的代码与firefox工作正常但没有使用phantomjs.It无法找到css路径

def __init__(self):
    # self.driver = webdriver.Firefox()
    self.driver = webdriver.PhantomJS(desired_capabilities=cap)
    # self.driver.implicitly_wait(90)
    # self.driver.set_page_load_timeout(90)
    # self.driver.set_script_timeout(90)
    self.driver.set_window_size(1120, 1000)

def get_data(self,link):
    soup = None
    tag_list = []
    try:
        self.driver.get(link)
        soup = BeautifulSoup(self.driver.page_source)
        elements = self.driver.find_elements_by_xpath("//*[(@ng-click)]")

        for element in elements:                
            attr_value  = element.get_attribute("ng-click")
            tag_name =  element.tag_name
            tags = soup.find_all(tag_name,{'ng-click':attr_value})
            for tag in tags:
                try:
                    tag_text = str(tag)
                    if "login" in (tag.text).lower() or "log in" in (tag.text).lower() or "signin" in (tag.text).lower() or "sign in" in (tag.text).lower():
                    # if "log in" in tag_text.lower() or "login" in tag_text.lower():
                        comp_class_ = element.get_attribute('class')
                        class_ = comp_class_.replace(" ",".")
                        css_path = str(tag_name+"."+class_)
                        element = WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, css_path)))  
                        element.click()
                        # self.driver.find_element_by_css_selector(css_path).click()
                        tag_list.append(tag)
                except Exception as e:
                    pass
        print (self.driver.page_source).encode('utf-8')
    except Exception as e:
        print "Error while Scrapping {}".format(link)
    return soup

PhantomJS：无法执行onClick活动

0 个答案: