我有pandasDataframes,我想在上面应用一个函数。我想进行多次迭代,因此我认为使用多个threads
会很好。
它是这样的:
def my_function(data_inputs_train):
#..... do something with dataframe....
#..... group by for loops etc .......
#..... create new dataframe.....
return newPandasDataFrame
class myThread (threading.Thread):
def __init__(self, threadID, data_inputs_train):
threading.Thread.__init__(self)
self.threadID = threadID
self.data_inputs_train = data_inputs_train
def run(self):
result_df = my_function(data_inputs_train)
thread1 = myThread(1, data_inputs_train)
thread2 = myThread(2, data_inputs_train)
所以两个线程都应该返回一个新的数据帧,并且在两个线程完成之后,我想将两个线程返回的两个结果连接起来。
我该怎么做?如何从run()
函数返回任何对象,如何在我的thread1
对象中访问它?
谢谢!
通过第一个答案进行更新,但它不起作用,也存在缩进问题。
class myThread (threading.Thread):
def __init__(self, threadID, name, sleep, cust_type, data_inputs_train):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.sleep = sleep
self.cust_type = cust_type
self.data_inputs_train = data_inputs_train
#here i need to get the newPandasDataFrame object.
result_df = fdp.optimze_score_and_cl(data_inputs_train)
def returnTheData(self):
return result_df
答案 0 :(得分:0)
所以这是您程序的基础。.我只是在使用示例数据来说明如何设置它
def myFunction(x):
df = pd.DataFrame(['1', '2'], columns = ['A'])
return df
class myThreads(threading.Thread):
def __init__(self, threadID, name, sleep, cust_type, data_inputs_train):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.sleep = sleep
self.cust_type = cust_type
# call the methods you need on your data...
self.data_inputs_train = myFunction(data_inputs_train)
def returnTheData(self):
return self.data_inputs_train
df = pd.DataFrame(['1'], columns = ['A'])
thread1 = myThreads(1, "EX1", 1, 'EX', df)
thread2 = myThreads(2, "IN1", 2, 'IN', df)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
df1 = thread1.returnTheData()
df2 = thread2.returnTheData()
print(df1)
print(df2)
您声明线程。.启动它们,基本上让它们运行所需的线程。.
join()
允许main函数等待所有线程完成其处理。
df2 = thread2.returnTheData()
您只需调用一个函数即可返回所需的数据。
工作代码