Question

我希望能够运行一个python文件（file1），只需将几个大文件作为python对象加载到内存中，然后使用不同的python文件（file2），访问那些相同的对象，而不必再次将文件重新加载到内存中。 我的动机是希望能够迭代地修改/开发file2，而不必浪费时间在每次迭代时重新加载相同的大文件。

在Jupyter笔记本中，这可以通过运行一次加载文件的单元格轻松完成;然后，笔记本中的所有其他单元格都可以访问这些对象。我希望能够在不同的python文件之间建立相同的串扰。

有没有办法在单独的.py文件之间建立内部笔记本Jupyter样式的单元到单元共享python对象？

（编辑包含一个例子）

以下是一个示例场景;让我们说有两个文件：

file1.py：

from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time

file2.py：

#notice that the q, p objects are never loaded, but they are accessible 
#(this is possible if the contents of these py files are in separate cells
#in a Jupyter notebook)
for experiment, chromosome in q.iteritems():
    #stuff to work with dict object
for experiment, chromosome in p.iteritems():
    #stuff to work with dict object

我想做

python file1.py

一次，然后执行

python file2.py

任意次数（即迭代修改file2中的代码）。请注意，在此方案中，file1.py中创建的对象可供file2.py访问。我的问题是：这种情况可能吗？

Answer 1

对象不属于特定文件。它们所属的类或生成它们的函数可能已经超出了“物理上”驻留在不同文件中的模块，但这并不重要。只要您在单个python解释器会话中，就不需要复制对象。

有一个问题：如果你有一个模块，你想要修改它，并且你想将最新版本的模块加载到已经导入模块的正在运行的python解释器中，它将“拒绝”这样做（这个实际上是一种性能优化，因此您可以多次保存导入模块。

你可以“强制”python解释器在python3中通过importlib.reload或在python2中的reload内置来重新加载模块。有关详细信息，请参阅this question。

在您的示例中，数据将不被共享，因为您有两个不同的python进程。数据不是在两个进程之间共享的（通常，如果你有两个“C进程”（用C语言编写的程序），它们也不共享任何数据。虽然它们可以相互发送数据，但这需要复制，你想避免的。）

但您可以将数据和函数导入“共享”python解释器。

file1.py：

from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time

file2.py：

from file1 import q,p
#notice that the q, p objects are never loaded, but they are accessible 
#(this is possible if the contents of these py files are in cells
#in a Jupyter notebook)
    for experiment, chromosome in q.iteritems():
        #stuff to work with dict object
    for experiment, chromosome in p.iteritems():
        #stuff to work with dict object

file3.py：

import file2 
# file2.py will be executed once when importing
# attributes will be accessible by file2.attribute_name

inp = ""
while inp != "quit":
    inp = input("Type anything to reload or 'quit' to quit")

    # 'import file2' would **not** execute file2 because imports
    # are only done once. Use importlib.reload (or reload in python2)
    # to "force" reloading of module 
    importlib.reload(file2)

然后你可以通过python file3.py“开始执行”，并等待任何输入你重新加载file2。当然，你可以建立何时重新加载任意复杂的机制，例如，每当file2.py更改时重新加载（watchdog可能对此有帮助）

另一种方法是使用像

这样的东西

file4.py：

import importlib
import file2
def reload():
     importlib.reload(file2)

然后使用python -i file4.py。然后你在普通的python解释器中，但reload()将重新加载（即执行）file2。

请注意，您可以在jupyter / ipython笔记本中执行相同操作。甚至还有一些神奇的命令可以帮助你。有关详细信息，请参阅the documentation。

在两个（或更多）.py文件之间共享python对象

1 个答案: