如何检测类似的无序序列?

时间:2014-08-13 15:36:12

标签: python regex string list sequence

我想在道路网络中找到类似的交叉路口。我的诀窍是找到最相似的街道名称序列。我创建了几个名单。其中一个是参考,另外两个是相应的参考。我想找到具有相同街道名称和相同出现次数的那个。

有必要知道名称的顺序无关紧要,但只有相似名称的出现次数才有意义。

示例:

参考名称序列:
[u'Barytongatan', u'Tunnlandsgatan', u'Barytongatan']

邻居的相应名称序列是:
    {91: [u'Barytongatan', u'Tunnlandsgatan', u'Barytongatan'], 142: [u'Tunnlandsgatan', u'Tunnlandsgatan', u' ']} 首先,我需要知道这个问题是否已经存在解决方案。第二,选择列表作为序列的容器是一个好主意?最后,如果是这样,如何解决呢?

我想到了正则表达式,但似乎没有用,因为订单没有修复。

1 个答案:

答案 0 :(得分:1)

如果您创建了每个键的出现地图,然后在检查参考数组后减去该事件,那么即使数组在地图中出现故障,您也可以确保获得正确的答案。

reference = [u'Barytongatan', u'Tunnlandsgatan', u'Barytongatan']
sequence = {91: [u'Barytongatan', u'Tunnlandsgatan', u'Barytongatan'], 142: [u'Tunnlandsgatan', u'Tunnlandsgatan', u' ']}
def getMatching(reference, sequence):
    for value in sequence.values():
        tempMap = {}
        for v in value:
            try:
                tempMap[v] += 1
            except KeyError:
                tempMap[v] = 1

        # tempMap now contains a map of the each element in the array and their occurance in the array
        for v in reference:
            try:
                # Everytime we find this reference in the 'reference' list, subtract one from the occurance
                tempMap[v] -= 1
            except:
                pass

        # Loop through each value in the map, and make sure the occurrence is 0
        for v in tempMap.values():
            if v != 0:
                break
        else:
            # This else statement is for the for loop, if the else fires, then all the values were 0
            return value
        continue
    return None

print getMatching(reference, sequence) # Prints [u'Barytongatan', u'Tunnlandsgatan', u'Barytongatan']

现在,如果你有这个,它仍然可以工作:

reference = [u'Barytongatan', u'Tunnlandsgatan', u'Barytongatan']
sequence = {142: [u'Tunnlandsgatan', u'Tunnlandsgatan', u' '], 91: [u'Barytongatan', u'Barytongatan', u'Tunnlandsgatan']}
print getMatching(reference, sequence) # Prints [u'Barytongatan', u'Barytongatan', u'Tunnlandsgatan'] even though they are not in the same order as reference