Python Regex通用解决方案

时间:2017-04-18 07:34:02

标签: python regex

我需要将这个字符串拆分成字典,看起来像这样。请注意字符串中键的顺序可能不一样。

String = 'Specialty: "Neurology: Neurology, NeuroScience", Profession: Nurse Practitioner, Source: TestSource'

Dict = { 'Specialty': "Neurology: Neurology, NeuroScience", 'Profession': 'Nurse Practitioner', 'Source': 'TestSource' }

非常感谢这个问题的正则表达式解决方案。

2 个答案:

答案 0 :(得分:1)

最简单的方法是使用正确的解析器,例如pyparsingpip install pyparsing):

from pyparsing import *

text = 'Specialty: "Neurology: Neurology, NeuroScience", Profession: Nurse Practitioner, Source: TestSource'

word = Word(alphas)
key = word + Suppress(':')
words = Combine(word + ZeroOrMore(" " + word))
value = (QuotedString('"') ^ words) + Optional(Suppress(', '))

dictionary = dictOf(key, value)

print dictionary.parseString(text).asDict()
# => {'Source': 'TestSource', 'Profession': 'Nurse Practitioner', 'Specialty': 'Neurology: Neurology, NeuroScience'}

我们定义一个语法,将word定义为一系列字母,key作为单词后跟冒号(我们不会考虑),words as一个字符串,由一个单词组成,可能包含更多以空格分隔的单词,value作为单词或双引号引用的字符串,可能以逗号结尾(我们不想要),以及{{1} }作为键和值对的列表。然后我们让解析器做它的事情。

编辑:但我想如果你真的想要一个正则表达式解决方案......

dictionary

答案 1 :(得分:0)

你需要这样移动:

def create_dict(string, splitter=',', dict_splitter=':'):
    _dict = {}

    temp = ([s for s in string.split(splitter)])

    for item in temp:
        key = item.split(dict_splitter)[0]
        value = item.split(dict_splitter)[1]
        _dict[key] = value

    return _dict

string = 'Specialty: "Neurology; Neurology NeuroScience", Profession: Nurse Practitioner, Source: TestSource'

_dict = create_dict(string)

for k, v in _dict.items():
    print(k, '\t', v)


 #  Output must be like this

 #   Specialty    "Neurology; Neurology NeuroScience"
 #  Profession   Nurse Practitioner
 #  Source       TestSource