我有一个txt文件,它具有以下结构
start
id=1
date=21.05.2018
summ=500
end
start
id=7
date=23.05.2018
summ=500
owner=guest
end
我需要在字典列表中解析它(str:str(即使它是int类型或日期:将其转换为字符串))。即使用start
end
将其拆分,然后将其拆分为=
符号。 start
end
之间的行数可能不同。 d
但是一个人无法实现它。我试过这样的事情:
d ={}
arr = []
ind = 0
for line in plines:
ind = ind + 1
if 'startpayment' in line:
print('ind = ' + str(ind))
for i in range(ind, len(plines)):
print(i)
key, value = plines[i].strip().split('=')
if type(value) == 'str':
d[key] = str(value)
elif type(value) == 'int':
d[key] = int(value)
arr.append(d)
if 'endpayment' in line:
break
有人可以帮助我吗?感谢
答案 0 :(得分:1)
使用正则表达式。
import re
with open(filename, "r") as infile:
data = infile.read()
data = re.findall("(?<=\\bstart\\b).*?(?=\\bend\\b)", data, flags=re.DOTALL) #Find the required data from text
r = []
for i in data:
val = filter(None, i.split("\n"))
d = {}
for j in val:
s = j.split("=") #Split by "=" to form key-value pair
d[s[0]] = s[1]
r.append(d) #Append to list
print(r)
<强>输出:强>
[{'date': '21.05.2018', 'summ': '500', 'id': '1'}, {'date': '23.05.2018', 'owner': 'guest', 'summ': '500', 'id': '7'}]
答案 1 :(得分:0)
您可以使用递归构建一个简单的解析器,尝试在start
和end
块之间查找数据:
import re
class Parser:
def __init__(self, source:str):
self.source = iter(filter(None, source.split('\n')))
self.results = []
self.parse()
@staticmethod
def to_dict(between_blocks):
return dict(re.split('\s*\=\s*', i) for i in between_blocks)
def parse(self):
_line = next(self.source, None)
if _line is not None:
if _line == 'start':
scope = []
while True:
_temp = next(self.source, None)
if _temp is None:
raise Exception("Missing 'end' tag")
if _temp != 'end':
scope.append(_temp)
else:
break
self.results.append(Parser.to_dict(filter(None, scope)))
self.parse()
def __repr__(self):
return f'{Parsed}({self.results})'
print(Parser(open('filename.txt').read())).results)
输出:
[{'id': '1', 'date': '21.05.2018', 'summ': '500'}, {'id': '7', 'date': '23.05.2018', 'summ': '500', 'owner': 'guest'}]
试验:
tests = [[
"""
start
id=1
date=21.05.2018
summ=500
""", Exception],
[
"""
start
name = someone
age = 18
id = 23
end
start
name = someoneelse
age = 45
id = 55
end
start
name = lastname
age = 34
id = 5
end
""", None]
]
for text, is_error in tests:
try:
_ = Parser(text)
except:
assert is_error == Exception
else:
assert is_error is None
print('all tests passed')
输出:
all tests passed
答案 2 :(得分:0)
如果我的问题正确的话,我能想到的最简单的算法。
d ={}
arr = []
for line in plines:
if line == 'start':
continue
elif line =='end':
arr.append(d)
continue
else:
list_key_value = line.split('=')
d[list_key_value[0]] = int(list_key_value[1]) if
type(list_key_value[1]) == 'int' else str(list_key_value[1])
print (arr)
输出:
[{'id': '7', 'date': '23.05.2018', 'summ': '500', 'owner': 'guest'},
{'id': '7', 'date': '23.05.2018', 'summ': '500', 'owner': 'guest'}]
答案 3 :(得分:0)
你也可以尝试这样的事情:
from itertools import takewhile
with open('data.txt') as in_file:
items = [line.strip() for line in in_file.read().split()]
# ['start', 'id=1', 'date=21.05.2018', 'summ=500', 'end', 'start', 'id=7', 'date=23.05.2018', 'summ=500', 'owner=guest']
pos = [i for i, item in enumerate(items) if item == 'start']
# [0, 5]
blocks = [list(takewhile(lambda x: x != 'end', items[i+1:])) for i in pos]
# [['id=1', 'date=21.05.2018', 'summ=500'], ['id=7', 'date=23.05.2018', 'summ=500', 'owner=guest']]
print([dict(x.split('=') for x in block) for block in blocks])
哪个输出:
[{'id': '1', 'date': '21.05.2018', 'summ': '500'}, {'id': '7', 'date': '23.05.2018', 'summ': '500', 'owner': 'guest'}]
答案 4 :(得分:0)
您可以简单地解析文本文件,前提是您保留了一些上下文:在每个 start 行上启动一个新的词典,并将其添加到每个 end 行的列表中。
代码可以是:
def parse(fd):
"""Parse a file, fd is expected to be a file object"""
resul = [] # the list of dictionaries to return
d = None # an individual dict initialized to None
linenum = 0
for line in fd:
line = line.strip()
linenum += 1
if line.startswith('end'):
if d is not None:
resul.append(d)
d = None
elif line.startswith('start'):
d = {}
elif len(line) != 0:
key, val = line.split('=', 1)
d[key] = val
return resul
此处不处理文件中的语法错误(缺少开始或结束行,其他不正确的行):
=
符号)应该导致异常 ValueError:没有足够的值来解包(预期2,得到1)