环境: Windows7,Python 3.6.5,Scrapy 1.5.1
问题描述:
我有一个名为project_github
的项目,其中包含3个蜘蛛:spider1
,spider2
,spider3
。这些蜘蛛中的每一个都会将特定网站的数据抓取到该蜘蛛。
我正在尝试在执行特定的Spider时自动导出JSON文件,格式为NameOfSpider_TodaysDate.json
,以便从命令行可以:
执行脚本scrapy crawl spider1
,该脚本返回spider1_181115.json
当前,我在ITEM EXPORTERS
中使用settings.py
,其代码如下:
import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'
很明显,无论使用什么蜘蛛,这段代码总是写spider1_TodaysDate.json
。有什么建议吗?
答案 0 :(得分:0)
执行此操作的方法是将<!-- Input -->
<bean id="lastModifiedFileComparator" class="org.apache.commons.io.comparator.LastModifiedFileComparator"/>
<int-file:inbound-channel-adapter
id="inputAdapter"
channel="inputChannel"
directory="file:${input.files.path}"
comparator="lastModifiedFileComparator"
scan-each-poll="true">
<int:poller max-messages-per-poll="1" fixed-rate="5000">
<int:transactional transaction-manager="transactionManager" isolation="READ_COMMITTED" propagation="REQUIRED" timeout="60000" synchronization-factory="syncFactory"/>
</int:poller>
</int-file:inbound-channel-adapter>
<!-- Continue only if the concurrentmetadatastore doesn't contain the file. If if is not the case : insert it in the metadatastore -->
<int:filter input-channel="inputChannel" output-channel="processChannel" discard-channel="nullChannel" throw-exception-on-rejection="false" expression="@jdbcMetadataStore.putIfAbsent(headers[file_name], headers[timestamp]) == null"/>
<!-- Rollback by removing the file from the metadatastore -->
<int:transaction-synchronization-factory id="syncFactory">
<int:after-rollback expression="@jdbcMetadataStore.remove(headers[file_name])" />
</int:transaction-synchronization-factory>
<!-- Metadatastore configuration -->
<bean id="jdbcDataSource" class="org.apache.commons.dbcp.BasicDataSource">
<property name="url" value="jdbc:h2:file:${database.path}/shared;AUTO_SERVER=TRUE;AUTO_RECONNECT=TRUE;MVCC=TRUE"/>
<property name="driverClassName" value="org.h2.Driver"/>
<property name="username" value="${database.username}"/>
<property name="password" value="${database.password}"/>
<property name="maxIdle" value="4"/>
<property name="defaultAutoCommit" value="false"/>
</bean>
<bean id="jdbcMetadataStore" class="org.springframework.integration.jdbc.metadata.JdbcMetadataStore">
<constructor-arg ref="jdbcDataSource"/>
</bean>
<bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="jdbcDataSource"/>
</bean>
<!-- Workflow -->
<int:chain input-channel="processChannel" output-channel="outputChannel">
<int:service-activator ref="fileActivator" method="fileRead"/>
<int:service-activator ref="fileActivator" method="fileProcess"/>
<int:service-activator ref="fileActivator" method="fileAudit"/>
</int:chain>
<!-- Output -->
<int-file:outbound-channel-adapter
id="outputChannel"
directory="file:${output.files.path}"
filename-generator-expression ="payload.name">
<!-- Delete the source file -->
<int-file:request-handler-advice-chain>
<bean class="org.springframework.integration.handler.advice.ExpressionEvaluatingRequestHandlerAdvice">
<property name="onSuccessExpressionString" value="headers[file_originalFile].delete()"/>
</bean>
</int-file:request-handler-advice-chain>
</int-file:outbound-channel-adapter>
定义为要为其编写项目导出程序的特定蜘蛛网下的custom_settings
属性。蜘蛛设置会覆盖项目设置。
因此,对于class
:
spider1