我需要将这个字符串拆分成字典,看起来像这样。请注意字符串中键的顺序可能不一样。
String = 'Specialty: "Neurology: Neurology, NeuroScience", Profession: Nurse Practitioner, Source: TestSource'
Dict = { 'Specialty': "Neurology: Neurology, NeuroScience", 'Profession': 'Nurse Practitioner', 'Source': 'TestSource' }
非常感谢这个问题的正则表达式解决方案。
答案 0 :(得分:1)
最简单的方法是使用正确的解析器,例如pyparsing
(pip install pyparsing
):
from pyparsing import *
text = 'Specialty: "Neurology: Neurology, NeuroScience", Profession: Nurse Practitioner, Source: TestSource'
word = Word(alphas)
key = word + Suppress(':')
words = Combine(word + ZeroOrMore(" " + word))
value = (QuotedString('"') ^ words) + Optional(Suppress(', '))
dictionary = dictOf(key, value)
print dictionary.parseString(text).asDict()
# => {'Source': 'TestSource', 'Profession': 'Nurse Practitioner', 'Specialty': 'Neurology: Neurology, NeuroScience'}
我们定义一个语法,将word
定义为一系列字母,key
作为单词后跟冒号(我们不会考虑),words
as一个字符串,由一个单词组成,可能包含更多以空格分隔的单词,value
作为单词或双引号引用的字符串,可能以逗号结尾(我们不想要),以及{{1} }作为键和值对的列表。然后我们让解析器做它的事情。
编辑:但我想如果你真的想要一个正则表达式解决方案......
dictionary
答案 1 :(得分:0)
你需要这样移动:
def create_dict(string, splitter=',', dict_splitter=':'):
_dict = {}
temp = ([s for s in string.split(splitter)])
for item in temp:
key = item.split(dict_splitter)[0]
value = item.split(dict_splitter)[1]
_dict[key] = value
return _dict
string = 'Specialty: "Neurology; Neurology NeuroScience", Profession: Nurse Practitioner, Source: TestSource'
_dict = create_dict(string)
for k, v in _dict.items():
print(k, '\t', v)
# Output must be like this
# Specialty "Neurology; Neurology NeuroScience"
# Profession Nurse Practitioner
# Source TestSource