如何在启动对象时将参数传递给类?

时间:2017-01-04 14:13:26

标签: python qt class scrapy pyqt5

我想构建一个有两个按钮的gui,"打开输入文件"和"运行"。当用户点击"打开输入文件"时,他/她可以从他/她的计算机中选择一个文件,该文件中有一列包含网址。当该人点击" Run"时,会初始化基于scrapy的脚本,该脚本使用输入文件中的url作为start_urls(例如:https://doc.scrapy.org/en/latest/topics/spiders.html)。

我的脚本如下所示:

import scrapy
import sys
from PyQt5 import QtCore, QtGui, QtWidgets
from PyQt5.QtWidgets import QApplication, QMainWindow, QFileDialog
from scrapy.crawler import CrawlerProcess
file = "Empty"

class MySpider(scrapy.Spider):
    global file
    name = "scriptTest" #name of spider
    allowed_domains = ["web"] #where is spider allowed to crawl
    start_urls = [file] #where will spider crawl

    def parse(self): #scrapes start_urls according to instructions and returns results

class MyGui(object): #gives description of class type MyGui
    filename = 'Empty'
    file = []
    def setupUI(self): #describes how base form of gui will look

    def buttons(self): #creates buttons and connects functions to those buttons
        self.pushButton.setText(_translate("MainWindow", "Open Input File:")) #creates button with text
        self.pushButton.clicked.connect(self.showDialog) #connects button one to function showDialog
        self.pushButton_2.setText(_translate("MainWindow", "Run")) #creates button2 with text
        self.pushButton_2.clicked.connect(self.runSpider) #connects button two to function runSpider

    def showDialog(self): #opens QFileDialog and sets global file to name of selected file

    def runSpider(self): #should start crawling urls from selected file
        global file
        global filename

        def getUrls(filename): #returns first column containing urls (given by gui user in showDialog) as array.

        file = getUrls() #sets global variable file as returned value of getExcelData
        process = CrawlerProcess() #creates object 'process' that is of type 'Crawlerprocess'
        process.crawl(MySpider) #starts crawling
        process.start()  # the script will block here until the crawling is finished

app = QApplication(sys.argv)
window = QMainWindow()

ui = MyGui() #creates object called 'ui' of type 'MyGui
ui.setupUi(window) #launches gui window

就像我说的,我想在点击pushButton之后使用所选文件中的url作为蜘蛛的start_urls。但是,当我点击"运行"蜘蛛使用值"空"作为start_urls而不是使用全局变量文件的新值。我想我理解为什么;该类是对象的描述,因此当初始化对象时,它将具有所描述的类的属性。

我试图通过以下方式解决问题:

class MySpider:
    def __init__(self, arg):
    self.arg = arg

但我还没有找到解决方案。

问:如何将用户选择的文件传递给MySpider类?

提前致谢,如果我说错了,请纠正我! (对不起,如果我的代码非常混乱/不清楚,我还在学习很多东西。)

1 个答案:

答案 0 :(得分:1)

更新start_urls = [file]时,

file未更新。它保留了file的先前引用。

快速解决方法(我确定存在更好的解决方案)是直接更新start_urls类变量:

MySpider.start_urls = getUrls()
process.crawl(MySpider) #starts crawling

优势在于您不再需要全局file变量