我想从scrapyd加载一个JSON文件,但它似乎永远不会找到文件,无论我使用哪种引用方法。
通常我有open方法后跟文件名,我把文件名放在egg根目录以及项目egg文件夹中,但是在任何情况下都找不到文件。
如果有人知道如何用完整的例子来做这件事,我会非常乐于助人。
我的setup.py
如下所示
from setuptools import setup, find_packages
import os, sys
directory, filename = os.path.split(os.path.realpath(__file__))
sys.path.append(directory)
setup(
name='myscraper',
version='1.0',
packages=find_packages(),
entry_points={'scrapy': ['settings = myscraper.local_settings']},
install_requires=['selenium', 'scrapy', 'pyyaml', 'yamlordereddictloader', 'pyvirtualdisplay'],
package_data={'mypackage': ['myscraper/configuration/seeds.json', 'myscraper/configuration/*.yml'],
},
data_files=[("mydata", ["myscraper/configuration/seeds.json"])],
include_package_data=True,
zip_safe=False
)
项目结构
- my_crawler
--- setup.py
--- myscraper
------- configuration
-------------seeds.json
------- myspider.py
------- ...
如何在myspider.py
中读取json文件?
如何读取配置文件夹中的所有yaml文件?
我想使用类似的代码:
# how to get the content from seeds.json ?
content = pkgutil.get_data('mypackage', filename)
# how to walk the configuration directory from the egg?
for root, dirs, files in os.walk(config_dir):
for config_file in files:
config_file = open(os.path.join(root, config_file))
config_dict = yaml.load(config_file, Loader=yamlordereddictloader.Loader)
visit = config_dict.get("visit")
self.configuration[visit] = config_dict
https://github.com/scrapy/scrapyd-client
https://groups.google.com/forum/#!msg/scrapy-users/B70eq1_N3Fk/vR7aDeizj_sJ
https://support.scrapinghub.com/topics/1717-deploying-projects-with-resource-non-code-files/
https://support.scrapinghub.com/topics/725-including-additional-files-with-a-spider/
答案 0 :(得分:0)
from png_resources import resource_string
...
file_string = resource_string(
__name__.split('.')[0],
'myscraper/configuration/seeds.json',
)