使用正则表达式转换列表

时间:2016-08-10 13:56:05

标签: python regex

我有一个包含此表单元素的列表,字符串可能会更改,但格式保持相似:

["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]

我想将其转换为下面的列表。您可以看到它将删除相同出现的字符串的副本,例如Eth - 只在新列表中出现一次并将数字转换为x和y以更通用:

["RadioX","TetherX","SerialX/Y","EthX/Y","vlanX","modemX"]

我正在搞乱不同的正则表达式,我的方法非常混乱,对你们想到的任何优雅解决方案感兴趣。

以下是一些可以改进的代码,也设置不保留顺序,所以也应该改进:

a = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth0/2","Eth1/0","vlanX","modem0","modem1","modem2","modem3","modem6"]
c =[]
for i in a:
     b = re.split("[0-9]", i)
     if "/" in i:
         c.append(b[0]+"X/Y")
     elif len(b) > 1:
         c.append(b[0]+"X")
     else:
         c.append(b)
print set(c)

set(['modemX', 'TetherX', 'RadioX', 'vlanX', 'SerialX/Y', 'EthX/Y'])

保留订单的设置可能有所改进:

unique=[]
[unique.append(item) for item in c if item not in unique]
print unique

['RadioX', 'TetherX', 'SerialX/Y', 'EthX/Y', 'vlanX', 'modemX']

5 个答案:

答案 0 :(得分:2)

以下代码应足够通用,以允许字符串中最多包含3个数字,但您只需更改 repl 变量即可获得更多数字。

import re

elements = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]
repl = "XYZ"

for i in range(len(repl)):
    elements = [re.sub("[0-9]",repl[i], element, 1) for element in elements]

result = set(elements)

答案 1 :(得分:1)

import re


def particular_case(string):
    return re.sub("\d+", "X", re.sub("\d+/\d+", "X/Y", w))


def generic_case(string, letters=['X', 'Y', 'Z']):
    len_letters = len(letters)
    list_matches = list(re.finditer(r'\d+', string))
    result, last_index = "", 0

    if len(list_matches) == 0:
        return string

    for index, match in enumerate(list_matches):
        result += string[last_index:
                         match.start(0)] + letters[index % len_letters]
        last_index = match.end(0)

    return result

if __name__ == "__main__":
    words = ["Radio0", "Tether0", "Serial0/0", "Eth0/0", "Eth0/1", "Eth1/0",
             "Eth1/1", "vlanX", "modem0", "modem1", "modem2", "modem3", "modem6"]

    result = []
    result2 = []

    for w in words:
        new_value = particular_case(w)

        if new_value not in result:
            result.append(new_value)

        new_value = generic_case(w)

        if new_value not in result2:
            result2.append(new_value)

    print result
    print result2

答案 2 :(得分:1)

我使用re.finditer查找并替换所有数字:

def repl(string):
    #use regex to find all numbers
    numbers= re.finditer(r'\d+', string)

    #replace the numbers with letters. zip will stop when the sequence of
    #numbers OR letters runs out.
    for match, char in zip(numbers, 'XYZ'): #add more characters if necessary
        string= string[:match.start()] + char + string[match.end():]
    return string

s= set() #set to keep track of duplicates while maintaining order
result= []
for string in l:
    string= repl(string)
    if string in s: #ignore if duplicate
        continue

    #otherwise add to result list
    s.add(string)
    result.append(string)

这可以替换最多3个号码XYZ可以轻松修改以支持更多。

答案 3 :(得分:1)

你可以去:

import re

rx = r'\d+'
incoming = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]

outgoing = []
for item in incoming:
    t = re.sub(rx, 'X', item)
    if t not in outgoing:
        outgoing.append(t)
print(outgoing)
# ['RadioX', 'TetherX', 'SerialX/X', 'EthX/X', 'vlanX', 'modemX']

或者(在强大的Python列表推导的帮助下,只是另一个语法示例):

import re

rx = re.compile(r'\d+')
incoming = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]

def cleanitem(item):
    return rx.sub('X', item)

outgoing = []
[outgoing.append(item) \
    for item in (cleanitem(x) for x in incoming) \
    if item not in outgoing]
print(outgoing)

<小时/> 请参阅a working demo on ideone.com

答案 4 :(得分:1)

import re
import functools

lst = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]

def process_str(s, letters='XY'):
    return functools.reduce(lambda txt, letter: re.sub(r'\d+', letter, txt, 1), letters, s)

r = set(map(process_str, lst))
print(r)