我有两个列表:一个包含部分交易,另一个包含父交易:
partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
我将这些列表压缩成字典:
transactions = zip(partials, parents)
如您所见,某些部分交易具有相同的父交易。
我需要将字典中的项目分组为较小的组(较小的字典?),以便在每个组中只有一个事务属于一个父项。因此,例如所有与父母的交易" a"需要最终分成不同的小组。
我还需要尽可能少的群组,因为在现实世界中,每个群组都是手动上传的文件。
预期输出将是这样的:
第1组将包含交易1a,2b,3c,4d,7f,
第2组将包含交易5a,6d,8c,
第3组将包含交易9c,10a
我一直在摸不着头脑,并会感激任何建议。到目前为止,我没有任何可用的代码发布。
答案 0 :(得分:2)
这是一种方法:
def bin_unique(partials, parents):
bins = []
for (ptx,par) in zip(partials, parents):
pair_assigned = False
# Try to find an existing bin that doesn't contain the parent.
for bin_contents in bins:
if par not in bin_contents:
bin_contents[par] = (ptx, par)
pair_assigned = True
break
# If we haven't been able to assign the pair, create a new bin
# (with the pair as it's first entry)
if not pair_assigned:
bins.append({par: (ptx, par)})
return bins
<强>用法强>
partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
binned = bin_unique(partials, parents)
<强>输出强>
# Print the list of all dicts
print(binned)
# [
# {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')},
# {'a': (5, 'a'), 'd': (6, 'd'), 'c': (8, 'c')},
# {'c': (9, 'c'), 'a': (10, 'a')}
# ]
# You can access the bins via index
print(binned[0]) # {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}
print(len(binned)) # 3
# Each bin is a dictionary, keyed by parent, but the values are the (partial, parent) pair
print(binned[0].keys()) # dict_keys(['a', 'b', 'c', 'd', 'f'])
print(binned[0].values()) # dict_values([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')])
# To show that all the transactions exist
all_pairs = [pair for b in binned for pair in b.values()]
print(sorted(all_pairs) == sorted(zip(partials, parents))) # True
答案 1 :(得分:1)
一种方法就是跟踪您看过给定父母的次数。当您第一次看到父&#39; a&#39;时,您将该部分/父对添加到第一个组;第二组,第二组,等等。
例如:
def split_into_groups(transactions):
counts = {}
out_groups = {}
for partial, parent in transactions:
counts[parent] = target = counts.get(parent, 0) + 1
out_groups.setdefault(target, {})[partial] = parent
return out_groups
给了我
In [9]: split_into_groups(zip(partials, parents))
Out[9]:
{1: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 7: 'f'},
2: {5: 'a', 6: 'd', 8: 'c'},
3: {9: 'c', 10: 'a'}}
如果计数尚未显示,则使用counts.get
获取默认值0,如果我们没有&#39,则使用out_groups.setdefault
制作默认空字典并将其放入out_groups目前还没看到目标数量。
如果必须处理重复部分的情况,可以用
替换setdefault行out_groups.setdefault(target, []).append((partial, parent))
会将组成员转换为元组列表而不是字典:
{1: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')],
2: [(5, 'a'), (6, 'd'), (8, 'c')],
3: [(9, 'c'), (10, 'a')]}
答案 2 :(得分:0)
&amp; barciewicz ,根据您提供的输入和预期输出,我也试图以我的方式解决这个问题。
注意»我使用集合模块中的 OrderedDict()来保留字典中的键顺序。我还使用 json 模块来打印字典,列表等。
我已经展示了3种不同的方法来获得3个独立函数的结果,如下所示。
{ “group1”:“1a,2b,3c,4d,7f”, “group2”:“5a,8c,6d”, “group3”:“10a,9c” }
{ “group1”:{ “a”:1, “b”:2, “c”:3, “d”:4, “f”:7 }, “group2”:{ “a”:5, “c”:8, “d”:6 }, “group3”:{ “a”:10, “c”:9 } }
{ “一个”: [ 1, 5, 10 ] “b”:[ 2 ] “C”: [ 3, 8, 9 ] “d”:[ 4, 6 ] “F”: [ 7 ] }
»在http://rextester.com/OYFF74927在线尝试以下代码。
from collections import OrderedDict
import json;
def get_formatted_transactions(parents, partials):
d = OrderedDict();
for index, partial in enumerate(partials):
if parents[index] in d:
l = d[parents[index]]
l.append(partial)
d[parents[index]] = l;
else:
d[parents[index]] = [partial]
return d;
def get_groups(transactions):
i = 1;
groups = OrderedDict();
while transactions:
group_name = "group" + str(i);
groups[group_name] = {};
keys = list(transactions.keys());
for key in keys:
if transactions[key]:
groups[group_name][key] = transactions[key].pop(0);
if not transactions[key]:
del transactions[key];
else:
del transactions[key]
i += 1;
return groups;
def get_comma_separated_data(groups):
new_dict = OrderedDict();
for group_name in groups:
d = groups[group_name]
new_dict[group_name] = ",".join([str(value) + key for value, key in zip(d.values(), d.keys())])
return new_dict;
# Starting point
if __name__ == "__main__":
partials = [1,2,3,4,5,6,7,8,9,10];
parents = ['a','b','c','d','a','d','f','c','c','a'];
transactions = get_formatted_transactions(parents, partials);
# Pretty pritining ordered dictionary
print(json.dumps(transactions, indent=4));
print("\n");
# Creating groups to organize transactions
groups = get_groups(transactions)
# Pretty printing
print(json.dumps(groups, indent=4))
print("\n");
# Get comma separated form
comma_separated_data = get_comma_separated_data(groups);
# Pretty printing
print(json.dumps(comma_separated_data, indent=4));
输出»
{
"a": [
1,
5,
10
],
"b": [
2
],
"c": [
3,
8,
9
],
"d": [
4,
6
],
"f": [
7
]
}
{
"group1": {
"a": 1,
"b": 2,
"c": 3,
"d": 4,
"f": 7
},
"group2": {
"a": 5,
"c": 8,
"d": 6
},
"group3": {
"a": 10,
"c": 9
}
}
{
"group1": "1a,2b,3c,4d,7f",
"group2": "5a,8c,6d",
"group3": "10a,9c"
}