如何拆分字符串并在python中返回其分隔符?

时间:2017-07-26 12:12:05

标签: python dictionary

我有一个如下所示的字符串:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

我想用字母表(即A-Z或a-z)分隔,并将相关值放在列表字典中。 每组数字都与字母表相关联。例如,

' M'与47482,14,7I7等有关。

'我'与4,1等相关联

' H'与236792相关联。

我的最终数据结构将类似于

    dict = { 
      M:[47482, 14, 717],
      I:[4, 1],
      H:[236792]

    }

我的尝试:

import re
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
tmp = re.split('[a-zA-Z]', string1)
print(tmp)

我无法将这些字母作为分隔符。需要帮助来创建数据结构。

5 个答案:

答案 0 :(得分:6)

你走在正确的轨道上,但你应该使用略有不同的正则表达式并使用re.findall。像这样:

In [1]: string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

In [2]: import re, collections

In [3]: p = re.compile("([0-9]+)([A-Za-z])")

In [4]: dct = collections.defaultdict(list)

In [5]: for number, letter in p.findall(string1):
    ...:     dct[letter].append(number)
    ...:      

In [6]: dct
Out[6]: 
defaultdict(list,
            {'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3'],
             'H': ['236792'],
             'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'],
             'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13']})

这将找到所有数字对,后跟字符串中的一个字母,并将所有这些对放入字母中,字母为关键字,允许重复的数字。

答案 1 :(得分:1)

另一种解决方案,无需用户正则表达式:

import string
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"

result = dict()
tempValue = ''
for char in string1:

    if char not in string.ascii_letters:
        tempValue += char

    else:

        if char not in result:
            result[char] = []

        result[char].append(int(tempValue))
        tempValue = ''

print(result)

结果:

{
  'M': [47482, 14, 7, 26, 25, 20, 11, 17, 7, 14, 35, 30, 15, 16, 4, 15, 37, 24, 5, 27, 35, 10, 5, 24, 175, 13],
  'I': [4, 7, 1, 4, 2, 7, 7, 22, 3, 3, 2, 4, 11, 3, 3, 15],
  'D': [8, 1, 17, 5, 7, 1, 5, 6, 3],
  'H': [236792]
}

答案 2 :(得分:1)

如果您不想使用正则表达式,您可以编写自己的方法。

myDict = {}
num_string = ''

for char in string1:
    if char.isalpha():
        myDict.setdefault(char,[]).append(int(num_string))
        num_string = ''
    else if char.isdigit():
        num_string += char

注意:请勿使用关键字dict来引用变量。

答案 3 :(得分:0)

不使用正则表达式:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"


d = {}
str_num = ''
for c in string1:
    if c.isdigit():
        str_num += c
    else:
        if not c in d:
            d[c] = []
        d[c].append(int(str_num))
        str_num = ''

print(d)
>>>  {'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'], 'H': ['236792'], 'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13'], 'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3']}

答案 4 :(得分:0)

也没有rexexp:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H"
abc = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

s = ''
for k in string1:
    if k.isalpha():
        print('found', k, 'value', s)
        #add to dict here
        s = ''
    else:
        s += k