鉴于A列和B列,如何在B列找到A列中每个项目最可能的项目?那么基于嵌套哈希映射的东西呢?我想用Python做到这一点。
INPUT:
a,abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5
a,abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5
a,abd37534c7d9a2efb9465fghfghfghfghfghrewresdasdzfdghhgfhg
a,abd3753dfrtdgfdg563ae98078d6dfgfdgdfghdgasdaSADFBVFDGFD5
b,c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17
输出:
a,abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5
b,c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17
答案 0 :(得分:0)
我将假设“最可能”是指每个{a,b}出现次数最多的那个。
以下内容可能会有效,但可能会出现一些语法问题。在任何情况下,它都会让您了解如何解决问题(如果不能解决问题)。
tupleList = [('a','abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5'),
('a','abd37534c7d9a2efb9465de931cd7055ffdb8879563ae98078d6d6d5'),
('a','abd37534c7d9a2efb9465fghfghfghfghfghrewresdasdzfdghhgfhg'),
('a','abd3753dfrtdgfdg563ae98078d6dfgfdgdfghdgasdaSADFBVFDGFD5'),
('b','c681e18b81edaf2b66dd22376734dba5992e362bc3f91ab225854c17')]
# Load your list of a,blah into tupleList
myHashMap = {}
for col1, col2 in tupleList:
if col1 not in myHashMap:
myHashMap[col1] = {}
if col2 not in myHashMap[col1]:
myHashMap[col1][col2] = 0
myHashMap[col1][col2] += 1
# Now iterate over to find the one with highest occurrence.
for col in myHashMap:
maxKey = ''
maxVal = 0
for col2 in myHashMap[col1]:
if myHashMap[col1][col2] > maxVal:
maxVal = myHashMap[col1][col2]
maxKey = col2
print 'Most probable for %s is %s'%(col, maxKey)