如何在Python映射的值中比较列表中的元素,并检查是否至少有n个元素匹配?

时间:2018-12-26 17:24:52

标签: python python-3.x

我想遍历地图的值并比较列表中的元素,以查看至少3个元素是否以相同顺序匹配,然后返回一个列表,并返回与条件匹配的键。

Date       | Value | RowNum
2000-01-01 |   2   |    1  
2000-02-01 |   10  |    2
.
.
.
2003-12-01 |  11   |    100
2000-01-01 |  32   |    1  
2000-02-01 |  14   |    2
.
.
.
2003-12-01 |  4    |    100

这是示例地图。在此示例中,键s1和s3在列表值中具有至少三个与“ a”,“ b”,“ c”匹配的元素。因此s1和s3应该像s1-s3这样返回。类似地,s2和s4匹配,因此也应返回,但s2具有多个匹配项,因为它也与s5匹配,因此应返回s2-s5。我想为列表中的每个键值对返回所有可能的匹配项。 返回输出应类似于:

prefs = {
        's1': ["a", "b", "c", "d", "e"],
        's2': ["c", "d", "e", "a", "b"],
        's3': ["a", "b", "c", "d", "e"],
        's4': ["c", "d", "e", "b", "e"],
        's5': ["c", "d", "e", "a", "b"]
    }

我无法弄清楚如何遍历映射中的每个值,但这是按元素进行比较的一小段。我想知道是否可以设置一个计数器,并检查是否match_cnt> 3,然后返回列表中的键。

[[s1--s3], [s2--s4], [s2--s5], [s4--s5]]

此外,还需要一些有关此算法运行时的知识。 完整的代码解决方案将不胜感激。 有人建议我打开一个新问题here

2 个答案:

答案 0 :(得分:1)

您可以使用.items()遍历地图,然后使用切片将其与前三个列表项匹配:

prefs = {
    's1': ["a", "b", "c", "d", "e"],
    's2': ["c", "d", "e", "a", "b"],
    's3': ["a", "b", "c", "d", "e"],
    's4': ["c", "d", "e", "b", "e"],
    's5': ["c", "d", "e", "a", "b"]
}

results = []
for ki, vi in prefs.items():
    for kj, vj in prefs.items():
        if ki == kj:  # skip checking same values on same keys !
            continue

        if vi[:3] == vj[:3]:  # slice the lists to test first 3 characters
            match = tuple(sorted([ki, kj]))  # sort results to eliminate duplicates
            results.append(match)

print (set(results))  # print a unique set

返回:

set([('s1', 's3'), ('s4', 's5'), ('s2', 's5'), ('s2', 's4')])

修改:
要检查所有可能的组合,可以使用itertools中的combinations()。 iCombinations / jCombinations保留长度为3个列表项的订单:

from itertools import combinations

prefs = {
    's1': ["a", "b", "c", "d", "e"],
    's2': ["c", "d", "e", "a", "b"],
    's3': ["a", "b", "c", "d", "e"],
    's4': ["c", "d", "e", "b", "e"],
    's5': ["c", "d", "e", "a", "b"]
}

results = []
for ki, vi in prefs.items():
    for kj, vj in prefs.items():
        if ki == kj:  # skip checking same values on same keys !
            continue

        # match pairs from start
        iCombinations = [vi[n:n+3] for n in range(len(vi)-2)]
        jCombinations = [vj[n:n+3] for n in range(len(vj)-2)]

        # match all possible combinations
        import itertools
        iCombinations = itertools.combinations(vi, 3)
        jCombinations = itertools.combinations(vj, 3)

        if any([ic in jCombinations for ic in iCombinations]):  # checking all combinations
            match = tuple(sorted([ki, kj]))
            results.append(match)

print (set(results))  # print a unique set

这将返回:

set([('s1', 's3'), ('s2', 's5'), ('s3', 's5'), ('s2', 's3'), ('s2', 's4'), ('s1', 's4'), ('s1', 's5'), ('s3', 's4'), ('s4', 's5'), ('s1', 's2')])

答案 1 :(得分:1)

我试图尽可能详细。这应该是一个示例,您可以通过插入大量print消息以创建正在发生的情况的日志来经常解决该问题。

prefs = {
    's1': ["a", "b", "c", "d", "e"],
    's2': ["c", "d", "e", "a", "b"],
    's3': ["a", "b", "c", "d", "e"],
    's4': ["c", "d", "e", "b", "e"],
    's5': ["c", "d", "e", "a", "b"]
}

# Get all items of prefs and sort them by key. (Sorting might not be
# necessary, that's something you'll have to decide.)
items_a = sorted(prefs.items(), key=lambda item: item[0])

# Make a copy of the items where we can delete the processed items.
items_b = items_a.copy()

# Set the length for each compared slice.
slice_length = 3

# Calculate how many comparisons will be necessary per item.
max_shift = len(items_a[0][1]) - slice_length

# Create an empty result list for all matches.
matches = []

# Loop all items
print("Comparisons:")
for key_a, value_a in items_a:
    # We don't want to check items against themselves, so we have to
    # delete the first item of items_b every loop pass (which would be
    # the same as key_a, value_a).
    del items_b[0]
    # Loop remaining other items
    for key_b, value_b in items_b:
        print("- Compare {} to {}".format(key_a, key_b))
        # We have to shift the compared slice
        for shift in range(max_shift + 1):
            # Start the slice at 0, then shift it
            start = 0 + shift
            # End the slice at slice_length, then shift it
            end = slice_length + shift
            # Create the slices
            slice_a = value_a[start:end]
            slice_b = value_b[start:end]
            print("  - Compare {} to {}".format(slice_a, slice_b), end="")
            if slice_a == slice_b:
                print(" -> Match!", end="")
                matches += [(key_a, key_b, shift)]
            print("")

print("Matches:")
for key_a, key_b, shift in matches:
    print("- At positions {} to {} ({} elements), {} matches with {}".format(
        shift + 1, shift + slice_length, slice_length, key_a, key_b))

哪些印刷品:

Comparisons:
- Compare s1 to s2
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'a']
  - Compare ['c', 'd', 'e'] to ['e', 'a', 'b']
- Compare s1 to s3
  - Compare ['a', 'b', 'c'] to ['a', 'b', 'c'] -> Match!
  - Compare ['b', 'c', 'd'] to ['b', 'c', 'd'] -> Match!
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
- Compare s1 to s4
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'b']
  - Compare ['c', 'd', 'e'] to ['e', 'b', 'e']
- Compare s1 to s5
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'a']
  - Compare ['c', 'd', 'e'] to ['e', 'a', 'b']
- Compare s2 to s3
  - Compare ['c', 'd', 'e'] to ['a', 'b', 'c']
  - Compare ['d', 'e', 'a'] to ['b', 'c', 'd']
  - Compare ['e', 'a', 'b'] to ['c', 'd', 'e']
- Compare s2 to s4
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
  - Compare ['d', 'e', 'a'] to ['d', 'e', 'b']
  - Compare ['e', 'a', 'b'] to ['e', 'b', 'e']
- Compare s2 to s5
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
  - Compare ['d', 'e', 'a'] to ['d', 'e', 'a'] -> Match!
  - Compare ['e', 'a', 'b'] to ['e', 'a', 'b'] -> Match!
- Compare s3 to s4
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'b']
  - Compare ['c', 'd', 'e'] to ['e', 'b', 'e']
- Compare s3 to s5
  - Compare ['a', 'b', 'c'] to ['c', 'd', 'e']
  - Compare ['b', 'c', 'd'] to ['d', 'e', 'a']
  - Compare ['c', 'd', 'e'] to ['e', 'a', 'b']
- Compare s4 to s5
  - Compare ['c', 'd', 'e'] to ['c', 'd', 'e'] -> Match!
  - Compare ['d', 'e', 'b'] to ['d', 'e', 'a']
  - Compare ['e', 'b', 'e'] to ['e', 'a', 'b']
Matches:
- At positions 1 to 3 (3 elements), s1 matches with s3
- At positions 2 to 4 (3 elements), s1 matches with s3
- At positions 3 to 5 (3 elements), s1 matches with s3
- At positions 1 to 3 (3 elements), s2 matches with s4
- At positions 1 to 3 (3 elements), s2 matches with s5
- At positions 2 to 4 (3 elements), s2 matches with s5
- At positions 3 to 5 (3 elements), s2 matches with s5
- At positions 1 to 3 (3 elements), s4 matches with s5

目前还不清楚,您的输出实际上应该是什么。但是,我认为您可以将上面的代码转换为您的需求。