Question

我是Stack Overflow的新手，由于Scrapy和Selenium遇到问题，我来找你。

我使用Scrapy爬网网站，并使用Selenium来获取页面中元素的位置，然后将所有这些内容都放入一个csv文件中。

要使用csv编写，请使用：

scrapy crawn spider_name -o myfile.csv

在文件的开头，我使用 browser = webdriver.Firefox()，然后使用下面的代码来生成项目：

        browser.get(response.url)
        browser.save_screenshot(path+"/{}.png".format(c))



        e = browser.find_elements_by_xpath("//*")
        line = {}

        for elm in e:
            nbreChildren = len(elm.find_elements_by_xpath("./*"))
            text = elm.text.strip()
            if nbreChildren == 0 and text != "":

                line['file'] = '{}.html'.format(c)
                line['font-size'] = elm.value_of_css_property('font-size')
                line['color'] = elm.value_of_css_property('color')
                line['font-weight'] = elm.value_of_css_property('font-weight')
                line['tag_name'] = elm.tag_name
                line['class'] = elm.get_attribute("class")
                line['text'] = elm.text
                location = elm.location
                x = location['x']
                y = location['y']
                size =elm.size
                height = size['height']
                width = size['width']

                line['x'] = x
                line['y'] = y

                line['height'] = height
                line['width'] = width
                line['url'] = response.url
                if text == prix[0]:
                    line['target'] = 1
                elif text==product_name[0]:
                    line['target'] = 2

                else:
                    line['target'] = 0




                yield line

但是我抓取的csv输出只有2或3行，而不是数百行。

我怀疑我可能来自于Selenium的scrapy速度太快的事实，但是我没有找到解决此问题的有效方法（DOWNLOAD_DELAY在找到的每个页面中都使scrapy等待，但我对所有页面都不感兴趣）。

我有任何更多信息，谢谢：）

Scrapy输出CSV不完整

0 个答案: