我对链接提取器的规则是这样的,这是用于分页
...
var source = File
.ReadLines("input.txt") // Notice absence of "All", not ReadAllLines
.Select(line => line.Split(' ')) // You don't need Regex here, just Split
.Select(items => items
.Select(item => String.Equals(item, term, StringComparison.OrdinalIgnoreCase)
? @"<strong>" + term + @"</strong>"
: item))
.Select(items => String.Join(" ", items));
File.WriteAllLines("output.txt", source);
我在这里提出请求
rules = (
Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)), callback="parse_items", follow= True),
)
但是如果我使用代理,则规则提取器不会迭代,如果我删除def start_requests(self):
CrawlSpider.start_requests(self)
in_url = self.base_url + "Ahmedabad"
print "******************req_url********************",in_url
req = Request(in_url, dont_filter=True)
req.meta['proxy']="http://52.71.9.25:8080"
yield req
它正常工作。
请帮我解决这个问题。