在我的代码

Question

我认为，另一篇文章正是我要做的事情。 Python multiprocessing pool.map for multiple arguments

我试图实现的是我的伪代码：

在我的代码

中被另一个函数调用

def find_similar(db, num, listofsets):

#db is the sqlite3 database
#num is a variable I need for the sql query
#listofsets - is a list of sets.  each set is a set of strings

    threshold = 0.49


    similar_db_rows=[]

    for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
    #thing3 is a long string, each value separated by a comma
        items = set(row[3].strip().split(','))
        for set_item in listofsets:
            sim_score = sim_function(set_item, items) 
            if sim_score<threshold:
                similar_db_rows.append(row)
    return similar_db_rows

def sim_function(x,y):
#x is a set, and y is a second set.  The function does some calculation
and comparing, then returns a float value

    return float_value

这很有效。我试图做的是在第二个for循环使用多处理。而不是迭代每一组（因为我的集合列表可以有很多，这是一个主要的瓶颈）并调用函数，我想使用多处理，以便它将调用这些集合的函数，传递一个集合与来自sqlquery的第二个常量参数，一次很多，然后将每个集合中的结果数返回到列表中。在处理完所有集合之后，我可以使用该列表中的项目来检查是否有任何项目符合阈值。

我尝试使用Sebestian的func_star和pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))以及zeehio的`parmap'。但对我来说，例如，如果我在列表中有30个集合，它返回的结果列表超过30次，每次返回它将检查相似性阈值，并追加行，但从不打破这个，我最终Z'ing整个事情。

以下是我尝试的示例，首先使用parmap：

def find_similar(db, num, listofsets):

#db is the sqlite3 database
#num is a variable I need for the sql query
#listofsets - is a list of sets.  each set is a set of strings

threshold = 0.49

list_process_results=[]
similar_db_rows=[]

for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
        items = set(row[3].strip().split(','))
        list_process_results = parmap.starmap(sim_function, zip(listofsets), items)
        print list_process_results

        if any(t < threshold for t in list_process_results):
            #print "appending a row"
            similar_db_rows.append(row)
return similar_db_rows

和func_star：

def func_star(a_b):
"""Convert `f([1,2])` to `f(1,2)` call."""
return sim_function(*a_b)

def find_similar(db, num, listofsets):
pool = Pool()

#db is the sqlite3 database
#num is a variable I need for the sql query
#listofsets - is a list of sets.  each set is a set of strings

threshold = 0.49

list_process_results=[]
similar_db_rows=[]

for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
        items = set(row[3].strip().split(','))
        list_process_results=pool.map(func_star, itertools.izip(listofsets, itertools.repeat(items ) ))
        print list_process_results

        if any(t < threshold for t in list_process_results):
            #print "appending a row"
            similar_db_rows.append(row)
return similar_db_rows

同样的情况发生在我身上，它会永远持续下去，返回我期待的＃列表（每次都有一组不同的值），“追加一行”，并且永不爆发。

感谢您的帮助！额外的是，如果多处理也可以用于行查询的结果（外部循环），但我将首先征服内部循环

回答关于find_similar（）的Dano问题--- 我有另一个有for循环的函数。此for循环的每次迭代都会调用find_similar。当从find_similar返回结果列表时，它会打印列表返回的长度，然后继续完成循环的剩余部分，然后转到下一个元素。完成此for循环后，函数结束，并且不再调用find_similiar。

Answer 1

使用functools.partial代替izip / repeat / func_star，这是一个稍微好看的版本。

def sim_function(row_set, set_from_listofsets): # Note that the arguments are reversed from what you had before
    pass

def find_similar(db, num, listofsets):
    pool = Pool()
    threshold = 0.49

    similar_db_rows=[]
    for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
            func = partial(sim_function, set(row[3].strip().split(',')))
            list_process_results = pool.map(func, listofsets)
            print list_process_results

            if any(t < threshold for t in list_process_results):
                #print "appending a row"
                similar_db_rows.append(row)
    pool.close()
    pool.join()

你所描述的行为很奇怪。我不明白为什么你以前的任何一个版本都会在无限循环中运行，特别是在非Windows平台上。

Python多处理问题，参数设置/列表完成进程

在我的代码

1 个答案: