我有一个Python脚本,该脚本实质上运行每日数据提取过程,该过程在该脚本中运行20个脚本。每个子脚本都运行一个sql查询。
我在任务计划程序上有脚本,该脚本开始于凌晨3点,大约2小时后结束。
我想提高此脚本的速度并同时运行所有脚本。
我研究了线程和多处理,但是对于这些概念和计算机科学我还是相当陌生。进行此过程的最佳方法是什么?
import os
import sys
import importlib
import pandas as pd
# Key folder names and locations
# Main folder location
daily_data_path = 'Desktop:\\Daily_Data'
# Login folder name - must be saved in your desktop and must contain a module called login.py
login_folder = 'Login'
# Login folder location
login_folder_path = os.path.expanduser('~') + '\\Desktop\\' + login_folder
# Direct Python to the correct directories
sys.path.append(daily_data_path)
sys.path.append(login_folder_path)
# Function to dynamically import and run each script
def script_import(path_name):
try:
importlib.__import__(path_name)
run_ind = '1'
except:
run_ind = '0'
# List of daily data paths to iterate over
paths = ['script1.Scripts.script1,
'script2.Scripts.script2,
....,
'script20.Scripts.script20
]
for path in paths:
# Call the script_import function to run the daily data process
script_import(path)
答案 0 :(得分:0)
您可以使用multiprocessing模块中的工作池,以便在单独的进程中执行脚本。 Pool类提供了map_async
方法,该方法的工作方式类似于内置的map
函数,但是并行执行所有操作。代码可能看起来像这样:
import importlib
import multiprocessing
def script_import(name):
importlib.import_module(name)
if __name__ == '__main__':
# List of daily data paths to iterate over
paths = []
# create process pool
procpool = multiprocessing.Pool()
# apply 'script_import' function in parallel
result = procpool.map_async(script_import, paths)
# close the pool and wait for processes to complete
procpool.close()
procpool.join()
答案 1 :(得分:0)
听起来您的子脚本可以独立于主脚本运行,所以我建议使用create extension if not exists uuid_ossp;
create table users
(
user_id uuid default uuid_generate_v4() not null,
name varchar(255),
created_on timestamp,
modified_on timestamp
);
create table product
(
product_id uuid default uuid_generate_v4() not null,
name varchar(25),
created_on timestamp,
modified_on timestamp
);
create table products_users
(
user_id uuid,
product_id uuid
);
,例如
subprocess.Popen