我希望以下问题不会太久。但除此之外我无法用问题来解释我想要的东西:
从How to use importlib to import modules from arbitrary sources?学到的(我昨天的问题) 我为新文件类型(.xxx)编写了一个specfic加载器。 (实际上xxx是pyc的加密版本,以防止代码被盗)。
我想添加导入挂钩以用于新文件类型" xxx"不以任何方式影响其他类型(.py,.pyc,.pyd)。
现在,加载程序为ModuleLoader
,继承自mportlib.machinery.SourcelessFileLoader
。
使用sys.path_hooks
加载器应作为钩子添加:
myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))
注意:通过调用modloader.activateLoader()
加载名为test
的模块(test.xxx
)后,我得到:
>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>
但是,当我在添加钩子之前删除sys.path_hooks
的内容时:
sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))
它有效:
>>> modloader.activateLoader()
>>> import test
using xxx class
in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...
GENERATE CODE OBJECT ...
2 0 LOAD_CONST 0
3 LOAD_CONST 1 ('foo2')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (foo2)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>
将文件内容转换为代码对象后,可以正确导入模块。
但是我无法从包中加载相同的模块: import pack.test
注意:__init__.py
当然是包目录中的空文件。
>>> import pack.test
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>
还不行,我不能再从该软件包加载普通的* .py模块了:我得到了与上面相同的错误:
>>> import pack.testpy
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>
根据我的理解,遍历sys.path_hooks
直到最后一个条目被尝试。那么为什么第一个变体(不删除sys.path_hooks
)没有识别新的扩展名&#34; xxx&#34;和第二个变体(删除sys.path_hooks
)吗?
当sys.path_hooks
的条目无法识别&#34; xxx&#34;时,看起来机器正在抛出异常,而不是进一步遍历下一个条目。
为什么当前目录中的py,pyc和xxx模块的第二个版本正常工作,包pack
中的但不能正常工作?我希望py和pyc甚至不能在当前目录中工作,因为sys.path_hooks
只包含&#34; xxx&#34; ...
答案 0 :(得分:4)
简短的回答是sys.meta_path
中的默认路径查找器并不意味着在它已支持的相同路径中添加新的文件扩展名和导入程序。但仍有希望!
快速细分
sys.path_hooks
由importlib._bootstrap_external.PathFinder
类使用。
当导入发生时,sys.meta_path
中的每个条目都会被要求查找所请求模块的匹配规范。然后,路径查找器将获取sys.path
的内容并将其传递给sys.path_hooks
中的工厂函数。每个工厂函数都有机会引发ImportError(基本上工厂说“不,我不支持此路径条目”)或返回该路径的查找程序实例。然后,第一个成功返回的查找程序将缓存在sys.path_importer_cache
中。从那时起,PathFinder将只询问那些缓存的finder实例是否可以提供所请求的模块。
如果查看sys.path_importer_cache
的内容,您会看到sys.path
中的所有目录条目都已映射到FileFinder实例。非目录条目(zip文件等)将映射到其他查找程序。
因此,如果您将通过FileFinder.path_hook
创建的新工厂附加到sys.path_hooks
,则只有在前一个FileFinder挂钩不接受该路径时才会调用您的工厂。这是不太可能的,因为FileFinder可以在任何现有目录上工作。
或者,如果在现有工厂之前将新工厂插入sys.path_hooks,则只有在新工厂不接受路径时才会使用默认挂钩。而且,由于FileFinder如此自由,它会接受,这将导致只使用你的装载机,正如你已经观察到的那样。
让它发挥作用
因此,您可以尝试调整现有工厂以支持您的文件扩展名和导入程序(这很困难,因为导入器和扩展字符串元组保存在一个闭包中),或者做我最后做的事情,即添加一个新的元路径查找器。
所以,例如。来自我自己的项目,
import sys
from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename
from sibilant.module import prep_module, exec_module
SOURCE_SUFFIXES = [".lspy", ".sibilant"]
_path_importer_cache = {}
_path_hooks = []
class SibilantPathFinder(PathFinder):
"""
An overridden PathFinder which will hunt for sibilant files in
sys.path. Uses storage in this module to avoid conflicts with the
original PathFinder
"""
@classmethod
def invalidate_caches(cls):
for finder in _path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
@classmethod
def _path_hooks(cls, path):
for hook in _path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
@classmethod
def _path_importer_cache(cls, path):
if path == '':
try:
path = getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = _path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
_path_importer_cache[path] = finder
return finder
class SibilantSourceFileLoader(FileLoader):
def create_module(self, spec):
return None
def get_source(self, fullname):
return self.get_data(self.get_filename(fullname)).decode("utf8")
def exec_module(self, module):
name = module.__name__
source = self.get_source(name)
filename = basename(self.get_filename(name))
prep_module(module)
exec_module(module, source, filename=filename)
def _get_lspy_file_loader():
return (SibilantSourceFileLoader, SOURCE_SUFFIXES)
def _get_lspy_path_hook():
return FileFinder.path_hook(_get_lspy_file_loader())
def _install():
done = False
def install():
nonlocal done
if not done:
_path_hooks.append(_get_lspy_path_hook())
sys.meta_path.append(SibilantPathFinder)
done = True
return install
_install = _install()
_install()
SibilantPathFinder会覆盖PathFinder并仅替换那些引用sys.path_hook
和sys.path_importer_cache
的方法,这些方法具有类似的实现,而不是查看此模块本地的_path_hook
和_path_importer_cache
导入期间,现有的路径查找器将尝试查找匹配的模块。如果不能,那么我注入的SibilantPathFinder将重新遍历sys.path
并尝试找到与我自己的文件扩展名匹配的内容。
搞清楚
我最终深入研究了_bootstrap_external模块的源代码 https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
_install
函数和PathFinder.find_spec
方法是了解事情为何如此运作的最佳起点。
答案 1 :(得分:1)
sys.meta_path
中添加任何内容。相反,它在sys.path_hooks
中安装了一个特殊的钩子,它几乎就像PathFinder
中的sys.meta_path
和sys.path_hooks
中的钩子一样,而不是...只使用第一个勾“我可以处理这条路!”它按顺序尝试所有匹配的钩子,直到它找到一个实际从其ModuleSpec
方法返回有用的find_spec
的钩子:
@PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.
The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory. So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.
This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else). Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec. So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""
class hook:
"""
Use this little internal class rather than a function with a closure
or a classmethod or anything like that so that it's easier to
identify our hook and skip over it while processing sys.path_hooks.
"""
def __init__(self, basepath=None):
self.basepath = os.path.abspath(basepath)
def __call__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)
elif not self.handles(path):
raise ImportError(
'only directories under {} are supported'.format(
self.basepath), path=path)
return MetaFileFinder(path)
def handles(self, path):
"""
Return whether this hook will handle the given path, depending on
what its basepath is.
"""
path = os.path.abspath(path)
return (self.basepath is None or
os.path.commonpath([self.basepath, path]) == self.basepath)
def __init__(self, path):
self.path = path
self._finder_cache = {}
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)
def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None
last = len(sys.path_hooks) - 1
for idx, hook in enumerate(sys.path_hooks):
if isinstance(hook, self.__class__.hook):
continue
finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass
if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass
try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass
if finder is not None:
spec = finder.find_spec(fullname, target)
if (spec is not None and
(spec.loader is not None or idx == last)):
# If no __init__.<suffix> was found by any Finder,
# we may be importing a namespace package (which
# FileFinder.find_spec returns in this case). But we
# only want to return the namespace ModuleSpec if we've
# exhausted every other finder first.
return spec
# Module spec not found through any of the finders
return None
def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()
@classmethod
def install(cls, basepath=None):
"""
Install the MetaFileFinder in the front sys.path_hooks, so that
it can support any existing sys.path_hooks and any that might
be appended later.
If given, only support paths under and including basepath. In this
case it's not necessary to invalidate the entire
sys.path_importer_cache, but only any existing entries under basepath.
"""
if basepath is not None:
basepath = os.path.abspath(basepath)
hook = cls.hook(basepath)
sys.path_hooks.insert(0, hook)
if basepath is None:
sys.path_importer_cache.clear()
else:
for path in list(sys.path_importer_cache):
if hook.handles(path):
del sys.path_importer_cache[path]
这仍然是令人沮丧的,更加复杂,而不是必要的。我觉得在Python 2上,在导入系统重写之前,这样做要简单得多,因为对内置模块类型(.py
等)的支持较少是建立在导入钩子之上的它们本身,因此通过添加钩子来导入新模块类型来破坏导入普通模块更加困难。我将开始讨论python-ideas,看看是否有任何方法我们无法改善这种情况。