我在Scrapy中有一个简单的代码 -
def start_requests(self):
response = scrapy.Request(url,callback=self.parse_response)
response.meta['some_useful_params'] = some_useful_params
yield response
def parse_respone(self,resposne):
some_useful_params = response.meta['some_useful_params']
do_parsing_stuff()
if some_conditon==True:
presponse = scrapy.Request(otherurl,callback=self.parse_response)
presponse.meta['some_useful_params'] = some_useful_params
yield presponse
else:
yield items
上述程序对我来说很好,但我需要将其更改为检查该页面是否已存在html的内容,然后将其作为html而不是向网站发出请求。
现在的代码 -
def start_requests(self):
if html_exist:
request = scrapy.Request(url)
request.meta['some_useful_params'] = some_useful_params
response = scrapy.http.Response(url,body=cached_html,request=request)
#the below line doesn't call the method parse_response
self.parse_response(response)
else:
response = scrapy.Request(url,callback=self.parse_response)
response.meta['some_useful_params'] = some_useful_params
yield response
def parse_respone(self,resposne):
some_useful_params = response.meta['some_useful_params']
do_parsing_stuff()
if some_conditon==True:
if html_exist:
request = scrapy.Request(url)
request.meta['some_useful_params'] = some_useful_params
presponse = scrapy.http.Response(url,body=cached_html,request=request)
#the below line doesn't call the method parse_response
self.parse_response(presponse)
else:
presponse = scrapy.Request(otherurl,callback=self.parse_response)
presponse.meta['some_useful_params'] = some_useful_params
yield presponse
else:
yield items
我面临的问题是在第二个代码中,如果html退出,则不会调用parse_response方法。
虽然我完全不了解原因,但我认为它与Python生成器有关,我该如何解决这个问题。?
答案 0 :(得分:0)
您必须提供public static void main(String[] args) throws Throwable {
Class clazz = Class.forName("java.lang.String");
String str = (String) clazz.getConstructor(String.class).newInstance("foo");
System.out.println(str);
}
或items
,而不只是调用方法:
requests