用单引号拆分Python String

时间:2016-12-15 23:12:40

标签: python text split

我有一个这样的字符串:

text = ['Adult'   'Adverse Drug Reaction Reporting Systems/*classification'   '*Drug-Related Side Effects and Adverse Reactions'   'Hospital Bed Capacity   300 to 499'   'Hospitals   County'   'Humans'   'Indiana'   'Pharmacy Service   Hospital/*statistics & numerical data']

我需要分隔这个字符串,其中每个类别(由单个quotaions标记分隔存储在一个数组中)。例如:

text = Adult, Adverse Drug Reaction Reporting Systems...

我已经尝试过拆分功能,但我不确定该怎么做。

1 个答案:

答案 0 :(得分:1)

您可以使用正则表达式执行类似的操作,假设您没有未列出的约束:

>>> s = "'Adult'   'Adverse Drug Reaction Reporting Systems/*classification'   '*Drug-Related Side Effects and Adverse Reactions'   'Hospital Bed Capacity   300 to 499'   'Hospitals   County'   'Humans'   'Indiana'   'Pharmacy Service   Hospital/*statistics & numerical data'"
>>> import re
>>> regex = re.compile(r"'[^']*'")
>>> regex.findall(s)
["'Adult'", "'Adverse Drug Reaction Reporting Systems/*classification'", "'*Drug-Related Side Effects and Adverse Reactions'", "'Hospital Bed Capacity   300 to 499'", "'Hospitals   County'", "'Humans'", "'Indiana'", "'Pharmacy Service   Hospital/*statistics & numerical data'"]

我的正则表达式是将'留在字符串中 - 您可以使用str.strip("'")轻松删除它们。

>>> [x.strip("'") for x in regex.findall(s)]
['Adult', 'Adverse Drug Reaction Reporting Systems/*classification', '*Drug-Related Side Effects and Adverse Reactions', 'Hospital Bed Capacity   300 to 499', 'Hospitals   County', 'Humans', 'Indiana', 'Pharmacy Service   Hospital/*statistics & numerical data']

注意,这只有效,因为我假设你在字符串中没有任何转义引号...例如你永远不会:

'foo\'bar' 在许多编程环境中表达字符串的完全有效方式。如果您有这种情况,您将需要使用更强大的解析器 - 例如pyparsing

>>> import pyparsing as pp
>>> [x[0][0].strip("'") for x in pp.sglQuotedString.scanString(s)]
['Adult', 'Adverse Drug Reaction Reporting Systems/*classification', '*Drug-Related Side Effects and Adverse Reactions', 'Hospital Bed Capacity   300 to 499', 'Hospitals   County', 'Humans', 'Indiana', 'Pharmacy Service   Hospital/*statistics & numerical data']
>>> s2 = r"'foo\'bar' 'baz'"
>>> [x[0][0].strip("'") for x in pp.sglQuotedString.scanString(s2)]
["foo\\'bar", 'baz']