我想编写一个Python方法,该方法读取具有键值的文本文件:
FOO=BAR
BUZ=BLEH
我还希望通过引用和\n
以及通过支持here-docs来支持换行符:
MULTILINE1="This\nis a test"
MULTILINE2= <<DOC
This
is a test
DOC
虽然第一个很容易实现,但是我在努力第二个。在Python的stdlib(例如shlex)中可能已经可以使用的东西了吗?
答案 0 :(得分:1)
“ test.txt”内容:
source2
功能
:FOO=BAR
BUZ=BLEH
MULTILINE1="This\nis a test"
MULTILINE2= <<DOC
This
is a test
DOC
用法:
def read_strange_file(filename):
with open(filename) as f:
file_content = f.read().splitlines()
res = {}
key, value, delim = "", "", ""
for line in file_content:
if "=" in line and not delim:
key, value = line.split("=")
if value.strip(" ").startswith("<<"):
delim = value.strip(" ")[2:] # extracting delimiter keyword
value = ""
continue
if not delim or (delim and line == delim):
if value.startswith("\"") and value.endswith("\""):
# [1: -1] delete quotes
value = bytes(value[1: -1], "utf-8").decode("unicode_escape")
if delim:
value = value[:-1] # delete "\n"
res[key] = value
delim = ""
if delim:
value += line + "\n"
return res
输出:
result = read_strange_file("test.txt")
print(result)
答案 1 :(得分:-1)
我假设这是测试字符串(即,每行末尾有看不见的\n
字符):
s = ''
s += 'MULTILINE1="This\nis a test"\n'
s += 'MULTILINE2= <<DOC\n'
s += 'This\n'
s += 'is a test\n'
s += 'DOC\n'
我能做的最好的就是用NumPy作弊:
import numpy as np
A = np.asarray([ss.rsplit('\n', 1) for ss in ('\n'+s).split('=')])
keys = A[:-1,1].tolist()
values = A[1:,0].tolist()
#optionally parse here-documents
di = 'DOC' #delimiting identifier
values = [v.strip().lstrip('<<%s\n'%di).rstrip('\n%s'%di) for v in values]
print('Keys: ', keys)
print('Values: ', values)
#if you want a dictionary:
d = dict( zip(keys, values) )
结果是:
Keys: ['MULTILINE1', 'MULTILINE2']
Values: ['"This\nis a test"', '"This\nis a test"']
通过在字符串的开头偷偷地添加一个\n
字符,然后将整个字符串除以=
个字符,然后最后使用rsplit
来将所有值保留在右边来工作即使这些值包含多个=
字符,也要使用\n
的值。打印数组A
使事情更清楚:
[['', 'MULTILINE1'],
['"This\nis a test"', 'MULTILINE2'],
[' <<DOC\nThis\nis a test\nDOC', '' ]]