Question

我在python中寻找一个类似SQL-relational-table的数据结构，或者如果没有已经存在的话，可以使用一些提示来实现一个。从概念上讲，数据结构是一组对象（任何对象），它支持有效的查找/过滤（可能使用类似SQL的索引）。

例如，假设我的对象都有属性A，B和C，我需要对其进行过滤，因此我定义数据应该由它们编制索引。对象可能包含许多其他成员，这些成员不用于过滤。数据结构应支持等同于SELECT <obj> from <DATASTRUCTURE> where A=100的操作（B和C相同）。也应该可以通过多个字段（where A=100 and B='bar'）进行过滤。

要求是：

应该支持大量项目（~200K）。这些项目必须是对象本身，而不是它们的一些扁平化版本（排除sqlite且可能pandas）。
插入应该很快，应该避免重新分配内存（这几乎排除了pandas）
应该支持简单过滤（如上例所示），它必须比O(len(DATA))更有效，即避免“全表扫描”。

这种数据结构是否存在？

请不要建议使用sqlite。我需要重复转换object-＆gt; row和row-＆gt;对象，这既耗时又麻烦，因为我的对象不一定是平坦的。

此外，请不要建议使用pandas，因为重复插入的行太慢，因为它可能需要经常重新分配。

Answer 1

只要您（a，b，c）上没有任何重复内容，您就可以对dict进行子类化，输入由元组（a，b，c）索引的对象，并定义过滤方法（可能是生成器）返回符合条件的所有条目。

class mydict(dict):
    def filter(self,a=None, b=None, c=None):
        for key,obj in enumerate(self):
            if (a and (key[0] == a)) or not a:
                if (b and (key[1] == b)) or not b:
                    if (c and (key[2] == c)) or not c:
                        yield obj

这是一个丑陋且非常低效的例子，但你明白了。我确信在itertools或其他东西中有更好的实现方法。

编辑：

我一直在想这个。我昨晚玩弄了一些东西，然后将对象存储在一个列表中，并按照所需的键区存储索引的字典。通过获取所有指定条件的索引的交集来检索对象。像这样：

objs = []
aindex = {}
bindex = {}
cindex = {}

def insertobj(a,b,c,obj):
    idx = len(objs)
    objs.append(obj)
    if a in aindex:
        aindex[a].append(idx)
    else:
        aindex[a] = [idx]

    if b in bindex: 
        bindex[b].append(idx)
    else:
        bindex[b] = [idx]

    if c in cindex:
        cindex[c].append(idx)
    else :
        cindex[c] = [idx]

def filterobjs(a=None,b=None,c=None):
    if a : aset = set(aindex[a])
    if b : bset = set(bindex[b])
    if c : cset = set(cindex[c])
    result = set(range(len(objs)))
    if a and aset : result = result.intersection(aset)
    if b and bset : result = result.intersection(bset)
    if c and cset : result = result.intersection(cset)
    for idx in result:
        yield objs[idx]

class testobj(object):
    def __init__(self,a,b,c):
        self.a = a
        self.b = b
        self.c = c

    def show(self):
        print ('a=%i\tb=%i\tc=%s'%(self.a,self.b,self.c))

if __name__ == '__main__':
    for a in range(20):
        for b in range(5):
            for c in ['one','two','three','four']:
                insertobj(a,b,c,testobj(a,b,c))

    for obj in filterobjs(a=5):
        obj.show()
    print()
    for obj in filterobjs(b=3):
        obj.show()
    print()
    for obj in filterobjs(a=8,c='one'):
        obj.show()

它应该相当快，虽然对象在列表中，但它们可以通过索引直接访问。＆＃34;搜索＆＃34;是在哈希的词典上完成的。

python中的关系数据结构

1 个答案: