Question

您将使用哪些技术/模块来解析特定的字符串部分。给定类型的行：

field 1: dog        field 2: first        comment: outstanding
field 1: cat        field 2:              comment: some comment about the cat

字段名称始终以冒号结尾，字段值可以为空，字段仅以空格分隔。我只想访问字段值。我知道如何使用正则表达式执行此操作，但我确信有更优雅的方法可以使用Python执行此操作。

Answer 1

这看起来像是一种固定宽度的格式。

如果是这样，你可以这样做：

data={}
ss=((0,19),(20,41),(42,80))
with open('/tmp/p.txt','r') as f:
    for n,line in enumerate(f):
        fields={}
        for i,j in ss:
            field=line[i:j]
            t=field.split(':')
            fields[t[0].strip()]=t[1].strip()
        data[n]=fields    

print data

打印：

{0: {'comment': 'outstanding', 'field 2': 'first', 'field 1': 'dog'}, 1: {'comment': 'some comment about the cat', 'field 2': '', 'field 1': 'cat'}}

如果你想要一个清单：

data=[]
ss=((0,19),(20,41),(42,80))
with open('/tmp/p.txt','r') as f:
    for n,line in enumerate(f):
        fields={}
        for i,j in ss:
            field=line[i:j]
            t=field.split(':')
            fields[t[0].strip()]=t[1].strip()
        data.append(fields)

在任何一种情况下，您都可以访问：

>>> data[0]['comment']
'outstanding'

Answer 2

这样的事情：

>>> with open("abc") as f:
    lis = []
    for line in f:
        lis.append(dict( map(str.strip, x.split(":")) for x in line.split(" "*8)))
...         
>>> lis
[{'comment': 'outstanding', 'field 2': 'first', 'field 1': 'dog'},
 {'comment': 'some comment about the cat', 'field 2': '', 'field 1': 'cat'}
]

>>> lis[0]['comment']    #access 'comment' field on line 1
'outstanding' 
>>> lis[1]['field 2']    # access 'field 2' on line 2
''

Answer 3

另一种选择是使用csv模块。

假设字段之间有一个制表符分隔符：

import StringIO
import csv

input_data = StringIO.StringIO("""field 1: dog  field 2: first  comment: outstanding
field 1: cat    field 2:    comment: some comment about the cat""")

data = []
for row in csv.reader(input_data, delimiter="\t"):
    line = {}
    for item in row:
        value = item.split(":")
        line[value[0]] = value[1].strip()

    data.append(line)

print data

打印

[{'comment': 'outstanding', 'field 2': 'first', 'field 1': 'dog'}, {'comment': 'some comment about the cat', 'field 2': '', 'field 1': 'cat'}]

解析字符串部分

3 个答案: