在python中使用子字符串进行搜索

时间:2018-02-01 07:40:35

标签: python text-processing python-textprocessing

我有txt个文件,其中两个列如下所示 -

LocationIndex   ID
P-1-A100A100    X000PY66QL
P-1-A100A100    X000RE0RRD
P-1-A100A101    X000R39WBL
P-1-A100A103    X000LJ7MX1
P-1-A100A104    X000S5QZMH
P-1-A100A105    X000MUMNOR
P-1-A100A105    X000S5R571
P-1-A100B100    X000MXVHFZ
P-1-A100B100    X000Q18233
P-1-A100B100    X000S6RSZJ
P-1-A100B101    X000K7C4HN
P-1-A100B102    X000RN9U59
P-1-A100B103    X000R4MZE1
P-1-A100B104    X000K9HSKT
P-1-A100C101    X000MCB5DZ
P-1-A100C101    X000O0T0RX
P-1-A100C102    X000RULTGZ
P-1-A100C104    X000O5NXKN
P-1-A100C104    X000RN3G9F
P-1-A100C105    X000D4P1P5
P-1-A100C105    X000QNBKDF
P-1-A100D100    X000FADDHP
P-1-A100D100    X000KR34DB
P-1-A100D100    X000MPCZ1X
P-1-A100D100    X000S6TO0B
P-1-A100D101    B00PANFBJ2
P-1-A100D101    X000Q1IYQD
P-1-A100D101    X000QEMDV7
P-1-A100D101    X000QHRKM1
P-1-A100D101    X000RUGIKR
P-1-A100D102    X000FF656L
P-1-A100D102    X000S13C5J

LocationIndex 作为搜索索引,我需要找到哪些相邻的位置具有相同的 ID

定义相邻的位置:

特定位置索引left and right位置是通过更改位置索引的最后一个字符来给出的, 例如:对于P-1-A100B103,左边是P-1-A100B102,右边是P-1-A100B104 (最后一位数字在0-5

范围内

特定位置索引top and bottom位置是通过更改位置索引的第四个最后一个字符来给出的, 例如:对于P-1-A100B103,top是P-1-A100C103,右边是P-1-A100A103 (第四个最后一位数字在A-E

范围内

我需要查看给定位置索引的 ID (此处为例如P-1-A100B103)是否与其任何{{1}的ID匹配} left righttop 位置索引

我尝试了以下方式 -

bottom

我可以在import sys with open( 'Test.txt', 'r') as f: for line in f: line = line.split() x = int(line[1]) y = line[2] if x[-1:] > 0 && x[-1: < 5] && x[-4:] != 'A' && x[-4:] != 'E': # eliminating corner cases right = ord x[12] + 1 left = ord x[12] - 1 top = chr(ord x[9] + 1) bottom = chr(ord x[9] - 1) # how to search ID for individual right, left, top and bottom? 中执行此操作,但我需要在Python中完成此操作。任何提示/帮助将不胜感激

1 个答案:

答案 0 :(得分:1)

有点长而不是最有效率,但它完成了工作:

def getData():
    loc_keys = {}
    with open(FILE_PATH, 'r') as f:
        next(f)
        for line in f:
            line = line.split()
            loc, key = line[0], line[1]
            if loc not in loc_keys:
                loc_keys[loc] = set([])
            loc_keys[loc].add(key)

    return loc_keys


def is_adjacent(loc1, loc2):
    if int(loc1[-1]) == int(loc2[-1]) + 1 or \
       int(loc1[-1]) == int(loc2[-1]) - 1 or \
       ord(loc1[-4]) == ord(loc2[-4]) + 1 or \
       ord(loc1[-4]) == ord(loc2[-4]) - 1:
        return True
    else:
        return False


def find_matches(loc, loc_keys):
    if loc not in loc_keys:
        return None

    keys = loc_keys[loc]  # Set of keys for the input location
    matches = set([])
    for i in loc_keys.keys():
        # {*()} is an empty set literal
        if is_adjacent(loc, i) and loc_keys[i].intersection(keys) != {*()}:
            matches.add(i)

    return matches


# Call find_matches( <some LocationIndex>, getData() )