根据条件将字典项拆分为较小的字典

时间:2018-06-09 12:59:03

标签: python

我有两个列表:一个包含部分交易,另一个包含父交易:

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']

我将这些列表压缩成字典:

transactions = zip(partials, parents)

如您所见,某些部分交易具有相同的父交易。

我需要将字典中的项目分组为较小的组(较小的字典?),以便在每个组中只有一个事务属于一个父项。因此,例如所有与父母的交易" a"需要最终分成不同的小组。

我还需要尽可能少的群组,因为在现实世界中,每个群组都是手动上传的文件。

预期输出将是这样的:

第1组将包含交易1a,2b,3c,4d,7f,

第2组将包含交易5a,6d,8c,

第3组将包含交易9c,10a

我一直在摸不着头脑,并会感激任何建议。到目前为止,我没有任何可用的代码发布。

3 个答案:

答案 0 :(得分:2)

这是一种方法:

def bin_unique(partials, parents):
    bins = []
    for (ptx,par) in zip(partials, parents):
        pair_assigned = False
        # Try to find an existing bin that doesn't contain the parent.
        for bin_contents in bins:
            if par not in bin_contents:
                bin_contents[par] = (ptx, par)
                pair_assigned = True
                break
        # If we haven't been able to assign the pair, create a new bin
        #   (with the pair as it's first entry)
        if not pair_assigned:
            bins.append({par: (ptx, par)})

    return bins

<强>用法

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
binned = bin_unique(partials, parents)

<强>输出

# Print the list of all dicts
print(binned)
# [
#   {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}, 
#   {'a': (5, 'a'), 'd': (6, 'd'), 'c': (8, 'c')}, 
#   {'c': (9, 'c'), 'a': (10, 'a')}
# ]

# You can access the bins via index
print(binned[0])            # {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}
print(len(binned))          # 3

# Each bin is a dictionary, keyed by parent, but the values are the (partial, parent) pair
print(binned[0].keys())     # dict_keys(['a', 'b', 'c', 'd', 'f'])
print(binned[0].values())   # dict_values([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')])

# To show that all the transactions exist
all_pairs = [pair for b in binned for pair in b.values()]
print(sorted(all_pairs) == sorted(zip(partials, parents)))  # True

答案 1 :(得分:1)

一种方法就是跟踪您看过给定父母的次数。当您第一次看到父&#39; a&#39;时,您将该部分/父对添加到第一个组;第二组,第二组,等等。

例如:

def split_into_groups(transactions):
    counts = {}
    out_groups = {}
    for partial, parent in transactions:
        counts[parent] = target = counts.get(parent, 0) + 1
        out_groups.setdefault(target, {})[partial] = parent
    return out_groups

给了我

In [9]: split_into_groups(zip(partials, parents))
Out[9]: 
{1: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 7: 'f'},
 2: {5: 'a', 6: 'd', 8: 'c'},
 3: {9: 'c', 10: 'a'}}

如果计数尚未显示,则使用counts.get获取默认值0,如果我们没有&#39,则使用out_groups.setdefault制作默认空字典并将其放入out_groups目前还没看到目标数量。

如果必须处理重复部分的情况,可以用

替换setdefault行
out_groups.setdefault(target, []).append((partial, parent))

会将组成员转换为元组列表而不是字典:

{1: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')],
 2: [(5, 'a'), (6, 'd'), (8, 'c')],
 3: [(9, 'c'), (10, 'a')]}

答案 2 :(得分:0)

&amp; barciewicz ,根据您提供的输入和预期输出,我也试图以我的方式解决这个问题。

  

注意»我使用集合模块中的 OrderedDict()来保留字典中的键顺序。我还使用 json 模块来打印字典,列表等。

我已经展示了3种不同的方法来获得3个独立函数的结果,如下所示。

  

{         “group1”:“1a,2b,3c,4d,7f”,         “group2”:“5a,8c,6d”,         “group3”:“10a,9c”     }

  

{           “group1”:{               “a”:1,               “b”:2,               “c”:3,               “d”:4,               “f”:7           },           “group2”:{               “a”:5,               “c”:8,               “d”:6           },           “group3”:{               “a”:10,               “c”:9           }       }

  

{           “一个”: [               1,               5,               10           ]           “b”:[               2           ]           “C”: [               3,               8,               9           ]           “d”:[               4,               6           ]           “F”: [               7           ]       }

»在http://rextester.com/OYFF74927在线尝试以下代码。

from collections import OrderedDict
import json; 

def get_formatted_transactions(parents, partials):
    d = OrderedDict();

    for index, partial in enumerate(partials):
        if parents[index] in d:
            l = d[parents[index]]
            l.append(partial)
            d[parents[index]] = l;
        else:
            d[parents[index]] = [partial]

    return d;


def get_groups(transactions):
    i = 1;
    groups = OrderedDict();

    while transactions:
        group_name = "group" + str(i);
        groups[group_name] = {};
        keys = list(transactions.keys());

        for key in keys:
            if transactions[key]:
                groups[group_name][key] = transactions[key].pop(0);
                if not transactions[key]:
                    del transactions[key];
            else:
                del transactions[key]
        i += 1;

    return groups;

def get_comma_separated_data(groups):
    new_dict = OrderedDict();
    for group_name in groups:
        d = groups[group_name]
        new_dict[group_name] = ",".join([str(value) + key  for value, key in zip(d.values(), d.keys())])

    return new_dict;



# Starting point
if __name__ == "__main__":
    partials = [1,2,3,4,5,6,7,8,9,10];
    parents = ['a','b','c','d','a','d','f','c','c','a'];

    transactions = get_formatted_transactions(parents, partials);
    # Pretty pritining ordered dictionary
    print(json.dumps(transactions, indent=4));

    print("\n");

    # Creating groups to organize transactions
    groups = get_groups(transactions)
    # Pretty printing
    print(json.dumps(groups, indent=4))

    print("\n");

    # Get comma separated form 
    comma_separated_data = get_comma_separated_data(groups);
    # Pretty printing
    print(json.dumps(comma_separated_data, indent=4));
输出»
{
    "a": [
        1,
        5,
        10
    ],
    "b": [
        2
    ],
    "c": [
        3,
        8,
        9
    ],
    "d": [
        4,
        6
    ],
    "f": [
        7
    ]
}

{
    "group1": {
        "a": 1,
        "b": 2,
        "c": 3,
        "d": 4,
        "f": 7
    },
    "group2": {
        "a": 5,
        "c": 8,
        "d": 6
    },
    "group3": {
        "a": 10,
        "c": 9
    }
}

{
    "group1": "1a,2b,3c,4d,7f",
    "group2": "5a,8c,6d",
    "group3": "10a,9c"
}