从文本文件

时间:2015-05-29 03:55:42

标签: python xml string list

假设我在python中有一个文本文件说:

the data starts
test Age="0" Order="51" Doctor-ID="XX2342"
test Age="0" Order="53" Doctor-ID="XX2342"
end of data

将返回什么代码:

"0" "51" "XX2342"
"0" "53" "XX2342"

返回列表也可以。

[["0","51","XX2342"]
["0","53","XX2342"]]

谢谢!

3 个答案:

答案 0 :(得分:1)

这是正则表达式的完美工作

line = 'test Age="0" Order="51" Doctor-ID="XX2342"'
import re
re.findall('"(.*?)"', line)
>>> ['0', '51', 'XX2342']

用于多行操作:

lines = '''
test Age="0" Order="51" Doctor-ID="XX2342"
test Age="0" Order="53" Doctor-ID="XX2342"
'''
results = []
for line in lines.split('\n'):
    result = re.findall('"(.*?)"', line)
    if result:
        results.append(result)

for result in results:
    print result

这给出了:

['0', '51', 'XX2342']
['0', '53', 'XX2342']

答案 1 :(得分:1)

您需要使用.*?[^"]*,以便它也会匹配包含空字符串的双引号。

with open(file) as f:
    for line in f:
        if '"' in line:
            print re.findall(r'"(.*?)"', line)

with open(file) as f:
    for line in f:
        if '"' in line:
            print re.findall(r'"([^"]*)"', line)

答案 2 :(得分:0)

lines = [
    'test Age="0" Order="51" Doctor-ID="XX2342"',
    'test Age="0" Order="53" Doctor-ID="XX2342"'
]

for line in lines: 
    l = line.split('"')[1::2]
    print l

打印:

['0', '51', 'XX2342']
['0', '53', 'XX2342']

说明:

我将每行分成你的引号。然后我使用slicing拉出分割的奇数元素。

使用切片,符号为start:end:step。为此,我们从索引1开始,一直持续到最后,每次都踩两个索引。这将在引号内拉出项目。

如果您已转义引号,则此方法将无法按预期工作。

非常快速的切片示例(奖励tutorial link):

>>> L = range(10)
>>> L[1::2]
[1, 3, 5, 7, 9]

>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]