Question

我有两个列表：一个包含部分交易，另一个包含父交易：

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']

我将这些列表压缩成字典：

transactions = zip(partials, parents)

如您所见，某些部分交易具有相同的父交易。

我需要将字典中的项目分组为较小的组（较小的字典？），以便在每个组中只有一个事务属于一个父项。因此，例如所有与父母的交易＆＃34; a＆＃34;需要最终分成不同的小组。

我还需要尽可能少的群组，因为在现实世界中，每个群组都是手动上传的文件。

预期输出将是这样的：

第1组将包含交易1a，2b，3c，4d，7f，

第2组将包含交易5a，6d，8c，

第3组将包含交易9c，10a

我一直在摸不着头脑，并会感激任何建议。到目前为止，我没有任何可用的代码发布。

Answer 1

这是一种方法：

def bin_unique(partials, parents):
    bins = []
    for (ptx,par) in zip(partials, parents):
        pair_assigned = False
        # Try to find an existing bin that doesn't contain the parent.
        for bin_contents in bins:
            if par not in bin_contents:
                bin_contents[par] = (ptx, par)
                pair_assigned = True
                break
        # If we haven't been able to assign the pair, create a new bin
        #   (with the pair as it's first entry)
        if not pair_assigned:
            bins.append({par: (ptx, par)})

    return bins

<强>用法

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
binned = bin_unique(partials, parents)

<强>输出

# Print the list of all dicts
print(binned)
# [
#   {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}, 
#   {'a': (5, 'a'), 'd': (6, 'd'), 'c': (8, 'c')}, 
#   {'c': (9, 'c'), 'a': (10, 'a')}
# ]

# You can access the bins via index
print(binned[0])            # {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}
print(len(binned))          # 3

# Each bin is a dictionary, keyed by parent, but the values are the (partial, parent) pair
print(binned[0].keys())     # dict_keys(['a', 'b', 'c', 'd', 'f'])
print(binned[0].values())   # dict_values([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')])

# To show that all the transactions exist
all_pairs = [pair for b in binned for pair in b.values()]
print(sorted(all_pairs) == sorted(zip(partials, parents)))  # True

Answer 2

一种方法就是跟踪您看过给定父母的次数。当您第一次看到父＆＃39; a＆＃39;时，您将该部分/父对添加到第一个组;第二组，第二组，等等。

例如：

def split_into_groups(transactions):
    counts = {}
    out_groups = {}
    for partial, parent in transactions:
        counts[parent] = target = counts.get(parent, 0) + 1
        out_groups.setdefault(target, {})[partial] = parent
    return out_groups

给了我

In [9]: split_into_groups(zip(partials, parents))
Out[9]: 
{1: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 7: 'f'},
 2: {5: 'a', 6: 'd', 8: 'c'},
 3: {9: 'c', 10: 'a'}}

如果计数尚未显示，则使用counts.get获取默认值0，如果我们没有＆＃39，则使用out_groups.setdefault制作默认空字典并将其放入out_groups目前还没看到目标数量。

如果必须处理重复部分的情况，可以用

替换setdefault行

out_groups.setdefault(target, []).append((partial, parent))

会将组成员转换为元组列表而不是字典：

{1: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')],
 2: [(5, 'a'), (6, 'd'), (8, 'c')],
 3: [(9, 'c'), (10, 'a')]}

Answer 3

＆amp; barciewicz ，根据您提供的输入和预期输出，我也试图以我的方式解决这个问题。

注意»我使用集合模块中的 OrderedDict（）来保留字典中的键顺序。我还使用 json 模块来打印字典，列表等。

我已经展示了3种不同的方法来获得3个独立函数的结果，如下所示。

{ “group1”：“1a，2b，3c，4d，7f”， “group2”：“5a，8c，6d”， “group3”：“10a，9c” }

{ “group1”：{ “a”：1， “b”：2， “c”：3， “d”：4， “f”：7 }， “group2”：{ “a”：5， “c”：8， “d”：6 }， “group3”：{ “a”：10， “c”：9 } }

{ “一个”： [ 1， 5， 10 ] “b”：[ 2 ] “C”： [ 3， 8， 9 ] “d”：[ 4， 6 ] “F”： [ 7 ] }

»在http://rextester.com/OYFF74927在线尝试以下代码。

from collections import OrderedDict
import json; 

def get_formatted_transactions(parents, partials):
    d = OrderedDict();

    for index, partial in enumerate(partials):
        if parents[index] in d:
            l = d[parents[index]]
            l.append(partial)
            d[parents[index]] = l;
        else:
            d[parents[index]] = [partial]

    return d;


def get_groups(transactions):
    i = 1;
    groups = OrderedDict();

    while transactions:
        group_name = "group" + str(i);
        groups[group_name] = {};
        keys = list(transactions.keys());

        for key in keys:
            if transactions[key]:
                groups[group_name][key] = transactions[key].pop(0);
                if not transactions[key]:
                    del transactions[key];
            else:
                del transactions[key]
        i += 1;

    return groups;

def get_comma_separated_data(groups):
    new_dict = OrderedDict();
    for group_name in groups:
        d = groups[group_name]
        new_dict[group_name] = ",".join([str(value) + key  for value, key in zip(d.values(), d.keys())])

    return new_dict;



# Starting point
if __name__ == "__main__":
    partials = [1,2,3,4,5,6,7,8,9,10];
    parents = ['a','b','c','d','a','d','f','c','c','a'];

    transactions = get_formatted_transactions(parents, partials);
    # Pretty pritining ordered dictionary
    print(json.dumps(transactions, indent=4));

    print("\n");

    # Creating groups to organize transactions
    groups = get_groups(transactions)
    # Pretty printing
    print(json.dumps(groups, indent=4))

    print("\n");

    # Get comma separated form 
    comma_separated_data = get_comma_separated_data(groups);
    # Pretty printing
    print(json.dumps(comma_separated_data, indent=4));

输出»

{
    "a": [
        1,
        5,
        10
    ],
    "b": [
        2
    ],
    "c": [
        3,
        8,
        9
    ],
    "d": [
        4,
        6
    ],
    "f": [
        7
    ]
}

{
    "group1": {
        "a": 1,
        "b": 2,
        "c": 3,
        "d": 4,
        "f": 7
    },
    "group2": {
        "a": 5,
        "c": 8,
        "d": 6
    },
    "group3": {
        "a": 10,
        "c": 9
    }
}

{
    "group1": "1a,2b,3c,4d,7f",
    "group2": "5a,8c,6d",
    "group3": "10a,9c"
}

根据条件将字典项拆分为较小的字典

3 个答案: