Question

我正在编写一个程序，用于对导入模块的Python文件列表进行分类。因此，我需要扫描.py文件的集合，并返回它们导入的模块的列表。例如，如果我导入的其中一个文件包含以下行：

import os
import sys, gtk

我希望它返回：

["os", "sys", "gtk"]

我玩过modulefinder并写道：

from modulefinder import ModuleFinder

finder = ModuleFinder()
finder.run_script('testscript.py')

print 'Loaded modules:'
for name, mod in finder.modules.iteritems():
    print '%s ' % name,

但这不仅仅返回脚本中使用的模块。作为仅具有以下内容的脚本中的示例：

import os
print os.getenv('USERNAME')

从ModuleFinder脚本返回的模块返回：

tokenize  heapq  __future__  copy_reg  sre_compile  _collections  cStringIO  _sre  functools  random  cPickle  __builtin__  subprocess  cmd  gc  __main__  operator  array  select  _heapq  _threading_local  abc  _bisect  posixpath  _random  os2emxpath  tempfile  errno  pprint  binascii  token  sre_constants  re  _abcoll  collections  ntpath  threading  opcode  _struct  _warnings  math  shlex  fcntl  genericpath  stat  string  warnings  UserDict  inspect  repr  struct  sys  pwd  imp  getopt  readline  copy  bdb  types  strop  _functools  keyword  thread  StringIO  bisect  pickle  signal  traceback  difflib  marshal  linecache  itertools  dummy_thread  posix  doctest  unittest  time  sre_parse  os  pdb  dis

...而我只是想让它返回'os'，因为那是脚本中使用的模块。

任何人都可以帮助我实现这个目标吗？

更新：我只是想澄清一下，如果不运行正在分析的Python文件，只想扫描代码，我想这样做。

Answer 1

IMO最好的办法是使用http://furius.ca/snakefood/包。作者已经完成了所有必需的工作，不仅可以获得直接导入的模块，还可以使用AST来解析更静态分析将遗漏的运行时依赖性代码。

编写了一个命令示例来演示：

sfood ./example.py | sfood-cluster > example.deps

这将生成每个唯一模块的基本依赖文件。有关更多详细信息，请使用：

sfood -r -i ./example.py | sfood-cluster > example.deps

 import os
 import compiler
 from compiler.ast import Discard, Const
 from compiler.visitor import ASTVisitor

 def pyfiles(startPath):
     r = []
     d = os.path.abspath(startPath)
     if os.path.exists(d) and os.path.isdir(d):
         for root, dirs, files in os.walk(d):
             for f in files:
                 n, ext = os.path.splitext(f)
                 if ext == '.py':
                     r.append([d, f])
     return r

 class ImportVisitor(object):
     def __init__(self):
         self.modules = []
         self.recent = []
     def visitImport(self, node):
         self.accept_imports()
         self.recent.extend((x[0], None, x[1] or x[0], node.lineno, 0)
                            for x in node.names)
     def visitFrom(self, node):
         self.accept_imports()
         modname = node.modname
         if modname == '__future__':
             return # Ignore these.
         for name, as_ in node.names:
             if name == '*':
                 # We really don't know...
                 mod = (modname, None, None, node.lineno, node.level)
             else:
                 mod = (modname, name, as_ or name, node.lineno, node.level)
             self.recent.append(mod)
     def default(self, node):
         pragma = None
         if self.recent:
             if isinstance(node, Discard):
                 children = node.getChildren()
                 if len(children) == 1 and isinstance(children[0], Const):
                     const_node = children[0]
                     pragma = const_node.value
         self.accept_imports(pragma)
     def accept_imports(self, pragma=None):
         self.modules.extend((m, r, l, n, lvl, pragma)
                             for (m, r, l, n, lvl) in self.recent)
         self.recent = []
     def finalize(self):
         self.accept_imports()
         return self.modules

 class ImportWalker(ASTVisitor):
     def __init__(self, visitor):
         ASTVisitor.__init__(self)
         self._visitor = visitor
     def default(self, node, *args):
         self._visitor.default(node)
         ASTVisitor.default(self, node, *args) 

 def parse_python_source(fn):
     contents = open(fn, 'rU').read()
     ast = compiler.parse(contents)
     vis = ImportVisitor() 

     compiler.walk(ast, vis, ImportWalker(vis))
     return vis.finalize()

 for d, f in pyfiles('/Users/bear/temp/foobar'):
     print d, f
     print parse_python_source(os.path.join(d, f))

Answer 2

这取决于你想要多么彻底。使用过的模块是一个图灵完整的问题：一些python代码使用延迟导入只导入它们在特定运行中实际使用的东西，有些生成要动态导入的东西（例如插件系统）。

python -v将跟踪import语句 - 它可以说是最简单的检查。

Answer 3

您可能想尝试dis（双关语）：

import dis
from collections import defaultdict
from pprint import pprint

statements = """
from __future__ import (absolute_import,
                        division)
import os
import collections, itertools
from math import *
from gzip import open as gzip_open
from subprocess import check_output, Popen
"""

instructions = dis.get_instructions(statements)
imports = [__ for __ in instructions if 'IMPORT' in __.opname]

grouped = defaultdict(list)
for instr in imports:
    grouped[instr.opname].append(instr.argval)

pprint(grouped)

输出

defaultdict(<class 'list'>,
            {'IMPORT_FROM': ['absolute_import',
                             'division',
                             'open',
                             'check_output',
                             'Popen'],
             'IMPORT_NAME': ['__future__',
                             'os',
                             'collections',
                             'itertools',
                             'math',
                             'gzip',
                             'subprocess'],
             'IMPORT_STAR': [None]})

导入的模块为grouped['IMPORT_NAME']。

Answer 4

好吧，您总是可以编写一个简单的脚本来搜索文件中的import语句。这个可以找到所有导入的模块和文件，包括在函数或类中导入的模块和文件：

def find_imports(toCheck):
    """
    Given a filename, returns a list of modules imported by the program.
    Only modules that can be imported from the current directory
    will be included. This program does not run the code, so import statements
    in if/else or try/except blocks will always be included.
    """
    import imp
    importedItems = []
    with open(toCheck, 'r') as pyFile:
        for line in pyFile:
            # ignore comments
            line = line.strip().partition("#")[0].partition("as")[0].split(' ')
            if line[0] == "import":
                for imported in line[1:]:
                    # remove commas (this doesn't check for commas if
                    # they're supposed to be there!
                    imported = imported.strip(", ")
                    try:
                        # check to see if the module can be imported
                        # (doesn't actually import - just finds it if it exists)
                        imp.find_module(imported)
                        # add to the list of items we imported
                        importedItems.append(imported)
                    except ImportError:
                        # ignore items that can't be imported
                        # (unless that isn't what you want?)
                        pass

    return importedItems

toCheck = raw_input("Which file should be checked: ")
print find_imports(toCheck)

这对from module import something样式导入没有任何作用，但可以轻松添加，具体取决于您希望如何处理它们。它也没有做任何语法检查，所以如果你有一些有趣的业务，如import sys gtk, os，它会认为你已导入所有三个模块，即使该行是一个错误。它也不涉及有关导入的try / except类型语句 - 如果可以导入，则此函数将列出它。如果您使用as关键字，它也不能很好地处理每行多次导入。这里真正的问题是我必须编写一个完整的解析器才能真正做到这一点。给定的代码在许多情况下都有效，只要您了解有明确的极端情况。

一个问题是，如果此脚本与给定文件不在同一目录中，则相对导入将失败。您可能希望将给定脚本的目录添加到sys.path。

Answer 5

这样做 - 使用importlib实际导入模块，并检查以获取成员：

#! /usr/bin/env python
#
# test.py  
#
# Find Modules
#
import inspect, importlib as implib

if __name__ == "__main__":
    mod = implib.import_module( "example" )
    for i in inspect.getmembers(mod, inspect.ismodule ):
        print i[0]

#! /usr/bin/env python
#
# example.py
#
import sys 
from os import path

if __name__ == "__main__":
    print "Hello World !!!!"

输出：

tony@laptop .../~:$ ./test.py
path
sys

Answer 6

据我所知，这篇文章非常陈旧，但我找到了一个理想的解决方案。我提出了这个想法：

def find_modules(code):
    modules = []
    code = code.splitlines()
    for item in code:
        if item[:7] == "import " and ", " not in item:
            if " as " in item:
                modules.append(item[7:item.find(" as ")])
            else:
                modules.append(item[7:])
        elif item[:5] == "from ":
            modules.append(item[5:item.find(" import ")])

        elif ", " in item:
            item = item[7:].split(", ")
            modules = modules+item

        else:
            print(item)
    return modules

code = """
import foo
import bar
from baz import eggs
import mymodule as test
import hello, there, stack
"""
print(find_modules(code))

它来自逗号和普通的import语句。它不需要依赖，并且可以与其他代码行一起使用。

以上代码打印：

['foo', 'bar', 'baz', 'mymodule', 'hello', 'there', 'stack']

将代码放在find_modules函数中。

Answer 7

我最近需要给定python脚本的所有依赖关系，并且我采用了与其他答案不同的方法。我只关心顶级模块的模块名称（例如，我想要foo中的import foo.bar）。

这是使用ast module的代码：

import ast


modules = set()

def visit_Import(node):
    for name in node.names:
        modules.add(name.name.split(".")[0])

def visit_ImportFrom(node):
    # if node.module is missing it's a "from . import ..." statement
    # if level > 0 it's a "from .submodule import ..." statement
    if node.module is not None and node.level == 0:
        modules.add(node.module.split(".")[0])

node_iter = ast.NodeVisitor()
node_iter.visit_Import = visit_Import
node_iter.visit_ImportFrom = visit_ImportFrom

使用包含以下内容的python文件foo.py进行测试：

# foo.py
import sys, os
import foo1
from foo2 import bar
from foo3 import bar as che
import foo4 as boo
import foo5.zoo
from foo6 import *
from . import foo7, foo8
from .foo12 import foo13
from foo9 import foo10, foo11

def do():
    import bar1
    from bar2 import foo
    from bar3 import che as baz

我可以通过执行以下操作来获取foo.py中的所有模块：

with open("foo.py") as f:
    node_iter.visit(ast.parse(f.read()))
print(modules)

这会给我这个输出：

set(['bar1', 'bar3', 'bar2', 'sys', 'foo9', 'foo4', 'foo5', 'foo6', 'os', 'foo1', 'foo2', 'foo3'])

Answer 8

对于大多数仅在顶层导入模块的脚本，将文件作为模块加载并扫描其成员以获取模块就足够了：

import sys,io,imp,types
scriptname = 'myfile.py'
with io.open(scriptname) as scriptfile:
    code = compile(scriptfile.readall(),scriptname,'exec')
newmodule = imp.new_module('__main__')
exec(codeobj,newmodule.__dict__)
scriptmodules = [name for name in dir(newmodule) if isinstance(newmodule.__dict__[name],types.ModuleType)]

通过将模块的名称设置为'__main__'，模拟作为脚本运行的模块。因此，它还应该捕获时髦的动态模块加载。它不会捕获的唯一模块是仅导入到本地范围的模块。

Answer 9

我正在寻找类似的东西，我在名为PyScons的包中找到了一个宝石。 Scanner使用import_hook完成你想要的（7行）。这是一个简短的例子：

import modulefinder, sys

class SingleFileModuleFinder(modulefinder.ModuleFinder):

    def import_hook(self, name, caller, *arg, **kwarg):
        if caller.__file__ == self.name:
            # Only call the parent at the top level.
            return modulefinder.ModuleFinder.import_hook(self, name, caller, *arg, **kwarg)

    def __call__(self, node):

        self.name = str(node)

        self.run_script(self.name)

if __name__ == '__main__':
    # Example entry, run with './script.py filename'
    print 'looking for includes in %s' % sys.argv[1]

    mf = SingleFileModuleFinder()
    mf(sys.argv[1])

    print '\n'.join(mf.modules.keys())

Answer 10

实际上

工作得很好

print [key for key in locals().keys()
   if isinstance(locals()[key], type(sys)) and not key.startswith('__')]

Answer 11

感谢Tony Suffolk的检查，importlib样品...我建立了这个wee模块，如果它对你有所帮助，欢迎你使用它。回来，yaaaay！

{{1}}

Answer 12

我知道这已经很老了，但我也在寻找OP这样的解决方案。因此，我编写了这段代码，以通过文件夹中的脚本查找导入的模块。它适用于firebase.init hosting和@Test void testSomething() { try { //something that would throw an exception. } catch (Exception e) { assertEquals(true, false); }格式。我希望它可以帮助其他人。

import abc

Answer 13

我正在编辑我的原始答案。这可以通过下面的代码片段来实现，但解析AST可能是最好的方法。

def iter_imports(fd):
    """ Yield only lines that appear to be imports from an iterable.
        fd can be an open file, a list of lines, etc.
    """
    for line in fd:
        trimmed = line.strip()
        if trimmed.startswith('import '):
            yield trimmed
        elif trimmed.startswith('from ') and ('import ' in trimmed):
            yield trimmed

def main():
    # File name to read.
    filename = '/my/path/myfile.py'
    # Safely open the file, exit on error
    try:
        with open(filename) as f:
            # Iterate over the lines in this file, and generate a list of
            # lines that appear to be imports.
            import_lines = list(iter_imports(f))
    except (IOError, OSError) as exIO:
        print('Error opening file: {}\n{}'.format(filename, exIO))
        return 1
    else:
        # From here, import_lines should be a list of lines like this:
        #     from module import thing
        #     import os, sys
        #     from module import *
        # Do whatever you need to do with the import lines.
        print('\n'.join(import_lines))

    return 0

if __name__ == '__main__':
    sys.exit(main())

需要进一步的字符串解析才能获取模块名称。这不会捕获多行字符串或doc字符串包含单词'import'或'from X import'的情况。这就是我建议解析AST的原因。

返回脚本中使用的导入Python模块列表？

13 个答案: