假设我在python中有一个文本文件说:
the data starts
test Age="0" Order="51" Doctor-ID="XX2342"
test Age="0" Order="53" Doctor-ID="XX2342"
end of data
将返回什么代码:
"0" "51" "XX2342"
"0" "53" "XX2342"
返回列表也可以。
[["0","51","XX2342"]
["0","53","XX2342"]]
谢谢!
答案 0 :(得分:1)
这是正则表达式的完美工作
line = 'test Age="0" Order="51" Doctor-ID="XX2342"'
import re
re.findall('"(.*?)"', line)
>>> ['0', '51', 'XX2342']
用于多行操作:
lines = '''
test Age="0" Order="51" Doctor-ID="XX2342"
test Age="0" Order="53" Doctor-ID="XX2342"
'''
results = []
for line in lines.split('\n'):
result = re.findall('"(.*?)"', line)
if result:
results.append(result)
for result in results:
print result
这给出了:
['0', '51', 'XX2342']
['0', '53', 'XX2342']
答案 1 :(得分:1)
您需要使用.*?
或[^"]*
,以便它也会匹配包含空字符串的双引号。
with open(file) as f:
for line in f:
if '"' in line:
print re.findall(r'"(.*?)"', line)
或强>
with open(file) as f:
for line in f:
if '"' in line:
print re.findall(r'"([^"]*)"', line)
答案 2 :(得分:0)
lines = [
'test Age="0" Order="51" Doctor-ID="XX2342"',
'test Age="0" Order="53" Doctor-ID="XX2342"'
]
for line in lines:
l = line.split('"')[1::2]
print l
打印:
['0', '51', 'XX2342']
['0', '53', 'XX2342']
说明:
我将每行分成你的引号。然后我使用slicing拉出分割的奇数元素。
使用切片,符号为start:end:step
。为此,我们从索引1开始,一直持续到最后,每次都踩两个索引。这将在引号内拉出项目。
如果您已转义引号,则此方法将无法按预期工作。
非常快速的切片示例(奖励tutorial link):
>>> L = range(10)
>>> L[1::2]
[1, 3, 5, 7, 9]
>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]