我有一个Pandas DataFrame,我想用它作为Scrapy Start URL,函数get_links打开xlsx到DataFrame,这有一个我想运行蜘蛛的列LINK,
我使用
将其转换为dict[TestMethod]
public void T0005_IsMyPstStoreIsAvailableMoq()
{
Moq.Mock<Outlook.Application> objMockApplication = null;
Moq.Mock<Outlook.NameSpace> objMockNameSpace = null;
Moq.Mock<Outlook.Stores> objMockStores = null;
Moq.Mock<Outlook.Store> objMockStore = null;
try
{
objMockApplication = new Moq.Mock<Outlook.Application>();
objMockStores = new Moq.Mock<Outlook.Stores>();
objMockStore = new Moq.Mock<Outlook.Store>();
objMockNameSpace = new Moq.Mock<Outlook.NameSpace>();
objMockFrmOurlookWatcher.Setup(x => x.GetApplicationObject()).Returns(objMockApplication.Object);
objMockApplication.Setup(x => x.Session).Returns(objMockNameSpace.Object);
objMockNameSpace.Setup(x => x.Stores).Returns(objMockStores.Object);
objIFrmOurlookWatcher.IsMyPstStoreIsAvailable("ABCD StoreID");
}
catch (System.Exception ex)
{
Assert.Fail("IsMyPstStoreIsAvailableMoq" + ex.Message.ToString());
}
}
我知道这些链接可以通过url = url ['LINK']来实现,但我想要做的是将整个dict传递给scrapy输出
dictdf = df.to_dict(orient='records']
我的问题是有没有办法将整个dict传递给parse()所以在输出中也产生dictdf? 和scrapy的输出,
dictdf = {'Data1':'1','Data2':'2','LINK':'www.link.com',.....,'Datan':'n'}
# start urls
def start_requests(self):
urls = get_links()
for url in urls:
yield scrapy.Request(url=url['LINK'], callback=self.parse)
答案 0 :(得分:2)
如果我理解正确,您希望从start_requests
方法继承一些数据。
为此,您可以使用Request.meta
属性:
def start_requests(self):
data = [{
'url': 'http://httpbin.org',
'extra_data': 'extra',
}]
for item in data:
yield Request(item['url'], meta={'item': item})
def parse(self, response):
item = response.meta['item']
# {'url': 'http://httpbin.org', 'extra_data': 'extra'}