与scrapy

时间:2017-02-27 21:22:33

标签: python ajax scrapy

我得到了for循环的代码,完全正常。但是,我正在努力实现while循环。看起来我正在获得空的json对象。我怎么能在'while'工作时,记住在某些时候json对象变成{“data”:[],“result”:“ok”}

我的while循环

def after_login(self,response):
    if "smg" in response.body:
        #for i in range(0,100,10):
        minime = 2
        i = 10
        while len(self.parse_firstcall(response)['data']) > 1 or minime > 1:
                 print('------------------------------------')
                 print(len(self.parse_firstcall(response)['data']))
                 print(str(minime))
                 print(str(i))
                 print('-------------------------------------')       
                 yield FormRequest(
                    url='URL',
                    formdata={'act': 'serial', 'type': 'search', 'o': str(i), 's': '3','t': '0'},
                    callback=self.parse_firstcall
                                   )
                 minime = 0
                 i += 10
                 time.sleep(5)



def parse_firstcall(self,response):
    try:
        firstc = response.body      
        self.serialj = json.loads(firstc)
    except:
        self.serialj = {"data":['why', 'always', 'me'], "result": "ok"}
    return self.serialj

1 个答案:

答案 0 :(得分:0)

我找到的解决方案: 这里不需要while循环。简单地调用并检查数据len()是否大于1

def after_login(self,response):
    if "smg" in response.body:      
                 yield FormRequest(
                    url='url',
                    formdata={'act': 'serial', 'type': 'search', 'o': str(self.req), 's': '3','t': '0'},
                    callback=self.parse_firstcall 
                                   )


def parse_firstcall(self,response):
            firstc = response.body      
            serialj = json.loads(firstc)
            if len(serialj['data']) > 1:
                print('///////////////////////////////////////////')
                print('Request number: ' +str(self.req)+ ' been made')
                print('///////////////////////////////////////////')
                for i in serialj['data']:
                    self.series[i['title_orig']] = i
                self.req += 10
                yield FormRequest(
                            url='url',
                            formdata={'act': 'serial', 'type': 'search',  'o': str(self.req), 's': '3','t': '0'},
                            callback=self.parse_firstcall
                                 )