如何根据任意标准对齐两个列表?

时间:2012-11-19 15:32:57

标签: python

假设我有两个人名单,persons_apersons_b。我想尝试将列表persons_a中的每个人与persons_b中的某个人根据任意属性进行匹配,例如person.ageperson.town_from左右。

我怎么能以最有效的方式在Python中做到这一点? 我只是做一个for循环吗?

criteria = lambda a, b: a.age == b.age

result = []
for a in persons_a:
    for b in persons_b:
        if critera(a, b):
           result.add(a)

5 个答案:

答案 0 :(得分:5)

criteria = lambda a, b: a.age == b.age
cross = itertools.product( persons_a, persons_b )
result = ( a for a, b in cross if criteria( a, b ) )

这更像Pythonic,更容易阅读。 itertools只是一种对循环进行相同嵌套循环的方法,因此只是更容易阅读代码。

由于你必须循环遍历每个组合,你将无法比O( n^2 )更好,所以除非你可以短路循环或者通过两个列表单独传递一个贪婪的算法然后上面和你的是最佳的解决方案。如果你有半结构化数据,那就是说等长列表也是排序的,那么你可以通过一次通过列表来加速你的代码,但如果你没有任何像这样的结构,你将不得不坚持你的O( n^2 )算法。

答案 1 :(得分:3)

使用嵌套for循环,如问题和一些先前的答案,给出O(m * n)算法,其中m,n是列表 a b <的大小/ em>的。 (或者,如果列表大小相同,则为O(n ^ 2)。)要获得O(n)或O(n log n)算法,您可以(1)使用支持O的集合或字典数据结构(1)或O(log n)成员资格查找或可以(2)按标准元素按顺序对 a 和/或 b 进行排序,以允许O(1)或O(log n)匹配测试。请参阅以下示例代码,该代码使用两种类型的O(n log n)时间,然后是O(n)比较传递,用于标识匹配对,总时间为O(n log n)。

import collections
iTuple = collections.namedtuple('item',['data','age'])
p_a, p_b, n, p = [], [], 15, 19

for i in range(n):
   p_a.append(iTuple(i, ( 7*i)%p))
   p_b.append(iTuple(i, (11*i)%p))

sa = sorted(p_a, key=lambda x: x.age)
sb = sorted(p_b, key=lambda x: x.age)
#print ('sa: ', sa, '\n\nsb: ',sb, '\n')

ia, ib, result = 0, 0, []
while ia < n > ib:
   #print (ia, ib)
   if sa[ia].age == sb[ib].age:
      result.append([sa[ia], sb[ib]])
      print ('Match:', sa[ia], '\t', sb[ib],'\tat',ia,ib)
      ia, ib = ia+1, ib+1
   elif sa[ia].age  < sb[ib].age: ia += 1
   elif sa[ia].age >  sb[ib].age: ib += 1

#print ('Result:', result)

以下是上述程序的输出。

Match: item(data=0, age=0)   item(data=0, age=0)    at 0 0
Match: item(data=11, age=1)  item(data=7, age=1)    at 1 1
Match: item(data=3, age=2)   item(data=14, age=2)   at 2 2
Match: item(data=14, age=3)  item(data=2, age=3)    at 3 3
Match: item(data=6, age=4)   item(data=9, age=4)    at 4 4
Match: item(data=9, age=6)   item(data=4, age=6)    at 5 5
Match: item(data=1, age=7)   item(data=11, age=7)   at 6 6
Match: item(data=4, age=9)   item(data=6, age=9)    at 8 7
Match: item(data=7, age=11)  item(data=1, age=11)   at 9 9
Match: item(data=2, age=14)  item(data=3, age=14)   at 11 11
Match: item(data=13, age=15) item(data=10, age=15)  at 12 12
Match: item(data=8, age=18)  item(data=12, age=18)  at 14 14

答案 2 :(得分:3)

通过引用字典方法,您的意思是以下内容吗?。

class Store() :

    def __init__(self,types):
        self.a = {}
        for i in types : self.a[i]={}

    def addToStore(self,item) : # item is a dictionary
        for key in item.keys() :
            if key in self.a :
                self.a[key][item[key]] =   self.a[key].setdefault(item[key],[])+[item]

    def printtype(self,atype) :
        print atype
        for i in self.a[atype] : print self.a[atype][i]

if __name__=="__main__" :
    persons= Store(["age","place"])
    persons.addToStore({"name" : "Smith" , "place" : "Bury" , "age" : 32 })
    persons.addToStore({"name" : "Jones" , "place" : "Bolton" , "age" : 35 })
    persons.addToStore({"name" : "Swift" , "place" : "Radcliffe" , "age" : 32 })
    persons.addToStore({"name" : "Issac" , "place" : "Rochdale" , "age" : 32 })
    persons.addToStore({"name" : "Phillips" , "place" : "Bolton" , "age" : 26 })
    persons.addToStore({"name" : "Smith" , "place" : "Bury" , "age" : 41 })
    persons.addToStore({"name" : "Smith" , "place" : "Ramsbottom" , "age" : 25 })
    persons.addToStore({"name" : "Wilson" , "place" : "Bolton" , "age" : 26 })
    persons.addToStore({"name" : "Jones" , "place" : "Heywood" , "age" : 72 })
    persons.printtype("age")
    persons.printtype("place")

答案 3 :(得分:1)

result = [a for a in persons_a for b in persons_b if critera(a, b)]

list comprehension形式的循环。

根据您计划对结果执行的操作,您也可以使用看起来几乎相同的generator expression

result = (a for a in persons_a for b in persons_b if critera(a, b))

不同之处在于它不占用记忆。相反,它会在要求时生成值,就像生成器函数中的yield一样。

答案 4 :(得分:1)

假设您能够根据该自定义标准对人员进行排序:

使用itertools.groupby构建一个列表的字典(排序应该是O(n log n),然后找到另一个非常高效的(完全)匹配(O(m),即另一个名单中每个人的常数。)。


这是一个说明性的实现:

import random
import collections
import itertools
iTuple = collections.namedtuple('Person', ['town', 'age'])

# make up data
random.seed(1)
def random_person():
    age = random.randrange(19,49)
    town = random.choice("Edinburgh Glasgow Aberdeen".split())
    return iTuple(town, age)
n_f, n_m = 15, 20
females = [random_person() for x in xrange(n_f)]
males = [random_person() for x in xrange(n_m)]

# group by criterion of interest: age, town
by_age, by_town = lambda x: x.age, lambda x: x.town
males_by_age = dict((age, list(group)) for age, group in itertools.groupby(
        sorted(males, key=by_age), key=by_age))
males_by_town = dict((age, list(group)) for age, group in itertools.groupby(
        sorted(males, key=by_town), key=by_town))

然后您可以查询此词典以获取匹配列表:

# assign random matches according to grouping variable (if available)
print "matches by age:"
for person in females:
    candidates = males_by_age.get(person.age)
    if candidates:
        print person, random.choice(candidates)
    else:
        print person, "no match found"

print "matches by town:"
for person in females:
    candidates = males_by_town.get(person.town)
    if candidates:
        print person, random.choice(candidates)
    else:
        print person, "no match found"

输出类似于:

matches by age:
Person(town='Aberdeen', age=23) no match found
Person(town='Edinburgh', age=41) no match found
Person(town='Glasgow', age=33) no match found
Person(town='Aberdeen', age=38) Person(town='Edinburgh', age=38)
Person(town='Edinburgh', age=21) no match found
Person(town='Glasgow', age=44) Person(town='Glasgow', age=44)
Person(town='Edinburgh', age=41) no match found
Person(town='Aberdeen', age=32) no match found
Person(town='Aberdeen', age=25) Person(town='Edinburgh', age=25)
Person(town='Edinburgh', age=46) no match found
Person(town='Glasgow', age=19) no match found
Person(town='Glasgow', age=47) Person(town='Glasgow', age=47)
Person(town='Glasgow', age=25) Person(town='Glasgow', age=25)
Person(town='Edinburgh', age=19) no match found
Person(town='Glasgow', age=32) no match found
matches by town:
Person(town='Aberdeen', age=23) Person(town='Aberdeen', age=45)
Person(town='Edinburgh', age=41) Person(town='Edinburgh', age=27)
Person(town='Glasgow', age=33) Person(town='Glasgow', age=44)
Person(town='Aberdeen', age=38) Person(town='Aberdeen', age=45)
Person(town='Edinburgh', age=21) Person(town='Edinburgh', age=20)
Person(town='Glasgow', age=44) Person(town='Glasgow', age=24)
Person(town='Edinburgh', age=41) Person(town='Edinburgh', age=38)
Person(town='Aberdeen', age=32) Person(town='Aberdeen', age=34)
Person(town='Aberdeen', age=25) Person(town='Aberdeen', age=40)
Person(town='Edinburgh', age=46) Person(town='Edinburgh', age=38)
Person(town='Glasgow', age=19) Person(town='Glasgow', age=34)
Person(town='Glasgow', age=47) Person(town='Glasgow', age=42)
Person(town='Glasgow', age=25) Person(town='Glasgow', age=34)
Person(town='Edinburgh', age=19) Person(town='Edinburgh', age=27)
Person(town='Glasgow', age=32) Person(town='Glasgow', age=34)