您将获得有关您网站用户的信息。该信息包括用户名,电话号码和/或电子邮件。编写一个程序,该程序接收元组列表,其中每个元组表示特定用户的信息,并返回列表列表,其中每个子列表包含包含有关同一个人的信息的元组索引。例如:
Input:
[("MLGuy42", "andrew@example.com", "123-4567"),
("CS229DungeonMaster", "123-4567", "ml@example.net"),
("Doomguy", "john@example.org", "carmack@example.com"),
("andrew26", "andrew@example.com", "mlguy@example.com")]
Output:
[[0, 1, 3], [2]]
自" MLGuy42"," CS229DungeonMaster"和" andrew26"都是同一个人。
输出中的每个子列表都应该排序,外部列表应该按子列表中的第一个元素排序。
以下是我为此问题所做的代码段。它似乎工作正常,但我想知道是否有更好/优化的解决方案。任何帮助,将不胜感激。谢谢!
def find_duplicates(user_info):
results = list()
seen = dict()
for i, user in enumerate(user_info):
first_seen = True
key_info = None
for info in user:
if info in seen:
first_seen = False
key_info = info
break
if first_seen:
results.append([i])
pos = len(results) - 1
else:
index = seen[key_info]
results[index].append(i)
pos = index
for info in user:
seen[info] = pos
return results
答案 0 :(得分:1)
我认为我已经使用图表达到了优化的工作解决方案。基本上,我创建了一个图表,每个节点都包含其用户信息及其索引。然后,使用dfs遍历图形并找到重复项。
答案 1 :(得分:0)
我认为我们可以使用集合来简化它:
from random import shuffle
def find_duplicates(user_info):
reduced = unreduced = {frozenset(info): [i] for i, info in enumerate(user_info)}
while reduced is unreduced or len(unreduced) > len(reduced):
unreduced = dict(reduced) # make a copy
for identifiers_1, positions_1 in unreduced.items():
for identifiers_2, positions_2 in unreduced.items():
if identifiers_1 is identifiers_2:
continue
if identifiers_1 & identifiers_2:
del reduced[identifiers_1], reduced[identifiers_2]
reduced[identifiers_1 | identifiers_2] = positions_1 + positions_2
break
else: # no break
continue
break
return sorted(sorted(value) for value in reduced.values())
my_input = [ \
("CS229DungeonMaster", "123-4567", "ml@example.net"), \
("Doomguy", "john@example.org", "carmack@example.com"), \
("andrew26", "andrew@example.com", "mlguy@example.com"), \
("MLGuy42", "andrew@example.com", "123-4567"), \
]
shuffle(my_input) # shuffle to prove order independence
print(my_input)
print(find_duplicates(my_input))
<强>输出强>
> python3 test.py
[('CS229DungeonMaster', '123-4567', 'ml@example.net'), ('MLGuy42', 'andrew@example.com', '123-4567'), ('andrew26', 'andrew@example.com', 'mlguy@example.com'), ('Doomguy', 'john@example.org', 'carmack@example.com')]
[[0, 1, 2], [3]]
>