我在Python中进行了并行处理,可以从数据库中读取数据,进行一些操作并运行Dijkstra算法:
t1 = 200101
t2 = 200229
import psutil
from multiprocess import Pool
pool = Pool(psutil.cpu_count(logical=False))
def graph_analysis(i):
input_date = str(i)
sql_data = """select trim(cast(p.Barcode as nvarchar(20))) Barcode ,cast(s.invoiceid as
nvarchar(20)) invoiceid
from sales s inner join Product_981115 p on s.productid = p.productid
where s.date = """+ input_date +""" and s.qty != 0 and p.sectionid != 1691.199 and s.RegionID = """ + input_region
data = []
for chunk in pd.read_sql(sql_data,conn,chunksize = 1000000):
data.append(chunk)
data = pd.concat(data, ignore_index = True)
data = data.merge(candid_sale_invoices)
data = data.merge(candid_barcodes)
final_edges_df = data.iloc[:,[2,3,4]]
final_edges_tuples = [tuple(x) for x in final_edges_df.values]
Gm = ig.Graph.TupleList(final_edges_tuples, directed = True, edge_attrs = ['weight'])
longest_paths = pd.DataFrame(Gm.shortest_paths_dijkstra(None,None, weights = 'weight'))
longest_paths = longest_paths.swifter.apply(log_transform)
longest_paths["Date"] = input_date
longest_paths["RegionID"] = input_region
Return longest_paths
results = pool.map(graph_analysis,range(t1,(t2) + 1)))
pool.close()
results= pd.concat(results, ignore_index = True)
几天前,我运行了这段代码,并利用几乎所有内核完美地并行完成了。但是,当我今天运行它时,似乎已经生成了并行进程,但是内核却不是并行进行的。
该系统具有128 GB RAM和32个内核,自上次成功并行运行以来,其内容未发生任何变化。 我重新启动系统以解决任何可能的问题,但问题仍然存在。 那么可能是什么问题?
谢谢。