Question

这可能会被标记为重复，因为我在这个问题上经历了很多帖子，但我确实需要帮助来解决这个问题。

因此，假设我有一个用于绘制一些数据的主脚本，然后是一些其他脚本，我想将这些脚本导入到主绘图脚本中，以便检索这些数据。

我发现这样做的最pythonic方法是导入脚本然后像模块一样使用它，但由于我是Python的新手，我无法弄清楚如何使用这些代码正确地完成它。我使用其他人编写的代码并根据我的目的进行调整，这也可能是一个问题。我想节省一些资源，避免在每一步都写入和读取文件到磁盘，这就是为什么我想在我的绘图脚本中提取数据并让我的pandas面板在内存中用于绘图目的。

理想情况下，我的master_plot.py看起来像这样：

#functions for the plotting part

if __name__ == '__main__':
    import Tab

    data= Tab.main(some_path) #this script would calculate tab and return a pd.panel that will be used by the plotting section
    plot(data)

现在是实际的标签脚本：

#!/usr/bin/env python


class SubDomains( object ):
    '''
    a class that reads in and prepares a dataset to be used in masking
    '''
    def __init__( self, fiona_shape, rasterio_raster, id_field, **kwargs ):
        #do stuff
    @staticmethod
    def rasterize_subdomains( fiona_shape, rasterio_raster, id_field ):

        from rasterio.features import rasterize
        import six
        import numpy as np

        #do stuff
    def _domains_name( self, fiona_shape, rasterio_raster, **kwargs ):
        #do stuff
    def _domains_generator( self, **kwargs ):

        #do stuff

def extract_tab( filelist, subdomains_arr ):
    '''
    extract the number of burned pixels across subdomains
    '''
    def read_firescar( x ):
        #do stuff

    def tab_counts( x ):
        #do stuff

    def get_year_fn( x ):
        #do stuff


    return pd.DataFrame( rep_arr ).T

def tab_processing(maps_path):

    l = pd.Series( sorted( glob.glob( os.path.join( maps_path, 'FireScar*.tif' ) ) ) )

    # now lets groupby the repnum by splitting the filenames with a function
    def get_rep_fn( x ):
        ''' function to split the repnum out of an ALF output filename '''
        return os.path.basename( x ).split( '_' )[ 1 ]

    rep_groups = l.groupby( l.map( get_rep_fn ) )

    # initialize a SubDomains object 
    sub_domains = SubDomains( shp, rasterio.open( l[0] ), 'Id' )

    # now extract the newly rasterized sub domains numpy array
    subdomains_arr = sub_domains.sub_domains

    pool = mp.Pool( 32 )
    out = pool.map( lambda x: extract_tab( x, subdomains_arr ), [ group.tolist() for rep_num, group in rep_groups ] )
    pool.close()

    # now there is a list of pd.DataFrame objects that we can collapse into a single 3-D pd.Panel
    rep_list = sorted( [ int(rep_num) for rep_num, group in rep_groups ] )
    year_list = sorted( out[0].index.astype(int) )
    # rep_list = np.repeat( rep_list, len(year_list) )
    # year_list = np.array([year_list for i in range(len(rep_list)) ]).ravel()

    tab_panel = pd.Panel( { i:j for i,j in zip(rep_list, out) } )
    return tab_panel

if __name__ == '__main__':

    import os, sys, re, rasterio, glob, fiona, shapely
    import pandas as pd
    import numpy as np
    from pathos import multiprocessing as mp
    from collections import defaultdict

    path = sys.argv[1]
    tab_processing(path)

此脚本之前是独立的，因此我不确定调用子进程或是否仍可用作模块的最佳方法。这个确切版本的问题之一是global name 'pd' is not defined这是有意义的，但我真的没有看到如何解决这个问题，因为tab_processing函数中使用了所有函数和类。我希望我可以运行整个事情，而不是只运行一个函数，所以那么子进程可能是最好的做法吗？

不确定它是否会发生任何变化，但这会经历大量数据，因此需要在资源方面非常有效。

Pythonic将“复杂”脚本结果导入另一个脚本的方法

0 个答案: