Question

我在字典中有超过9000个数据属性。简单版本如下所示：

test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}

它们是基因id，在列表中有两个值。我想要的是最终得到多个字典，其中包含所有id值高于或低于某个值的id。因此，在这个例子中，我想说我想最终得到三个词典，一个包含低于85的id和值，另一个包含85以上，最后一个值为85以下，第二个值高于85.所以我会结束了这个：

testabove = { 892456: [88, 88]}

和

testbelow = { 524292: [80, 80]}

和

testboth = { {1092268: [81, 90]}

我不知道如何解决这个问题。

Answer 1

使用词典理解很容易做到这一点

.txt

正如Marein在comments中提到的那样，另一种方法是

>>> testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}
>>> testbelow = {i:j for i,j in test.items() if j[0]<85 and j[1] < 85}
>>> testboth = {i:j for i,j in test.items() if i not in testabove and i not in testbelow}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}
>>> testboth
{1092268: [81, 90]}

这使用all函数

比较

>>> test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}
>>> testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}
>>> testbelow = {i:j for i,j in test.items() if all(x<85 for x in j)}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}

正如您所看到的，直接的方式比使用$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}" 100000 loops, best of 3: 2.29 usec per loop $ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}" 1000000 loops, best of 3: 0.99 usec per loop更快。

Answer 2

这是另一个对于大量数据应该更快的解决方案，因为它在一次迭代中构建了所有三个字典。

def compare_both(xs,pivot):
 if xs[0] < pivot and xs[1] < pivot: return -1
 if xs[0] > pivot and xs[1] > pivot: return 1
 return 0

def sort_dict(d,pivot):
  dicts = [{},{},{}]
  for key,value in d.items():
    dicts[compare_both(value,pivot)+1][key] = value
  return dicts

Answer 3

有两个基本选项。一个是你需要一个可迭代的，一个是你需要一个更永久的数据结构。更永久的解决方案是初始化所有三个目标字典，然后遍历源字典并在适当的位置对它们进行排序。

target_dicts = {'aboveabove':{}, 'belowbelow':{}, 'belowabove':{}}
for k,v in src_dict.items():
    first = 'above' if v[0] > 85 else 'below'
    second = 'above' if v[1] > 85 else 'below'
    result = first+second  # 'aboveabove', 'belowbelow', etc...
    if result in target_dicts:
        target_dicts[result][k] = v

这将适当地填充您的target_dicts词典。但也许你不需要全部使用它们？您可能只需要一个迭代器，而不是实际重建内存中的那些迭代器。让我们改用过滤器！

target_iterators = {
    'aboveabove': filter(
        lambda k: all(v > 85 for v in src_dict[k]), src_dict),
    'belowbelow': filter(
        lambda k: all(v <= 85 for v in src_dict[k]), src_dict),
    'belowabove': filter(
        lambda k: src_dict[k][0] <= 85 and src_dict[k][1] > 85, src_dict)}

对字典中列表的值进行排序

3 个答案: