我可以在Python中做类似的事吗?
VB.net中的分割方法:
Dim line As String = "Tech ID: xxxxxxxxxx Name: DOE, JOHN Account #: xxxxxxxx"
Dim separators() As String = {"Tech ID:", "Name:", "Account #:"}
Dim result() As String
result = line.Split(separators, StringSplitOptions.RemoveEmptyEntries)
答案 0 :(得分:2)
鉴于此类数据格式不正确,您可以尝试re.split()
:
>>> import re
>>> mystring = "Field 1: Data 1 Field 2: Data 2 Field 3: Data 3"
>>> a = re.split(r"(Field 1:|Field 2:|Field 3:)",mystring)
['', 'Field 1:', ' Data 1 ', 'Field 2:', ' Data 2 ', 'Field 3:', ' Data 3']
如果数据格式正确,使用带引号的字符串和以逗号分隔的记录,您的工作会更容易。这将允许使用csv
模块来解析逗号分隔的值文件。
编辑:
您可以使用列表推导过滤掉空白条目。
>>> a_non_empty = [s for s in a if s]
>>> a_non_empty
['Field 1:', ' Data 1 ', 'Field 2:', ' Data 2 ', 'Field 3:', ' Data 3']
答案 1 :(得分:1)
>>> import re
>>> str = "Tech ID: xxxxxxxxxx Name: DOE, JOHN Account #: xxxxxxxx"
>>> re.split("Tech ID:|Name:|Account #:",str)
['', ' xxxxxxxxxx ', ' DOE, JOHN ', ' xxxxxxxx']
答案 2 :(得分:0)
我建议采用不同的方法:
>>> import re
>>> subject = "Tech ID: xxxxxxxxxx Name: DOE, JOHN Account #: xxxxxxxx"
>>> regex = re.compile(r"(Tech ID|Name|Account #):\s*(.*?)\s*(?=Tech ID:|Name:|Account #:|$)")
>>> dict(regex.findall(subject))
{'Tech ID': 'xxxxxxxxxx', 'Name': 'DOE, JOHN', 'Account #': 'xxxxxxxx'}
通过这种方式,您可以获得这种数据的有用数据结构:字典。
作为评论的正则表达式:
regex = re.compile(
r"""(?x) # Verbose regex:
(Tech\ ID|Name|Account\ \#) # Match identifier
: # Match a colon
\s* # Match optional whitespace
(.*?) # Match any number of characters, as few as possible
\s* # Match optional whitespace
(?= # Assert that the following can be matched:
Tech\ ID:|Name:|Account\ \#: # The next identifier
|$ # or the end of the string
) # End of lookahead assertion""")