需要帮助从网站上抓取img src;下面是我的代码

时间:2018-06-13 20:26:49

标签: html python-3.x selenium-webdriver web-scraping beautifulsoup

以下是我网站的网页抓取代码;它会点击重定向到页面的表单。从那个页面我需要提取[img] src url并以文本形式将其导出到csv中。我使用下面的代码从td标签中提取内容。当我运行相同的代码时,它不起作用,因为td标签没有内容,只有一个img标签。任何帮助将不胜感激。我是网络抓取新手。在此先感谢。

browser.find_element_by_css_selector( “的TextInput [值= '应用']”)。单击()

#select_finder = "//tr[contains(text(), 'NB')]//a"
            select_finder = "//td[text()='NB')]/../td[2]/a"
            browser.find_element_by_css_selector(".content a").click()

            assert "Application Details" in browser.title
            file_data = []

            try:
                assert "Application Details" in browser.title

                enlargement = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[3]/td[2]/b").text
                enlargement_answer1 = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[4]/td[2]").text
                enlargement_answer2 = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[4]/td[3]").text
                enlargement_text = enlargement + enlargement_answer1 + enlargement_answer2

                considerations = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[2]/b").text
                considerations_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[3]").text
                considerations_text = considerations + considerations_answer

                alteration = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[6]/b").text
                alteration_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[7]").text
                alteration_text = alteration + alteration_answer

                units = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[5]/td[3]/b").text
                units_answer = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[5]/td[4]").text
                units_text = units + units_answer

                occupancy = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[6]/td[3]/b").text
                occupancy_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[6]/td[4]").text
                occupancy_text = occupancy + occupancy_answer

                coo = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[7]/td[3]/b").text
                coo_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[7]/td[4]").text
                coo_text = coo + coo_answer

                floors = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[8]/td[3]/b").text
                floors_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[8]/td[4]").text
                floors_text = floors + floors_answer

            except (NoSuchElementException, AssertionError) as e:
                floors_text.append("No Zoning Characteristics Present")
                coo_text.append("n/a")
                occupancy_text.append("n/a")
                units_text.append("n/a")
                alteration_text.append("n/a")
                considerations_text.append("n/a")
                enlargement_text.append("n/a")


            with open('DOB.csv', 'a') as f:
                wr = csv.writer(f, dialect='excel')
                wr.writerow((block_number, lot_number, houseno, street, condo_text,
                             vacant_text, city_owned_text, file_data, floors_text, coo_text, occupancy_text, units_text, alteration_text,
                              considerations_text, enlargement_text ))

            browser.close()

1 个答案:

答案 0 :(得分:0)

正如你所说,你是网络抓取的新手,我建议你阅读一下:http://selenium-python.readthedocs.io/locating-elements.html 您正在以不推荐的方式使用XPath。

来自文档:“您可以使用XPath以绝对术语(不建议)定位元素,或者相对于具有id或name属性的元素定位元素。” 尝试使用其他定位器来获取您的图像 例如:driver.find_element_by_css_selector("img[src='images/box_check.gif']")