我有一个文本文件,格式如下:
BEGIN *A information here* END
BEGIN *B information here* END
BEGIN *C information here*
*C additional information here*
*C additional information here*
BEGIN *C secondary information here*
*C additional secondary information*
BEGIN *C tertiary information* END
END
BEGIN *C secondary information*
END
END
BEGIN *D information here* END
我想读取BEGIN和END之间的信息,并将信息保持为相同的格式,作为列表列表。我已经尝试分别用'['和']'替换'BEGIN'和'END',然后尝试评估结果字符串,但是当它命中信息中的数字时会抛出语法错误。这是我试过的代码:
with open(filepath) as infile:
mylist = []
for line in infile:
line = line.strip()
line = line.replace('BEGIN', '[')
line = line.replace('END', ']')
mylist.append(line)
for n in mylist:
print n
产生:
[ *A information here* ]
[ *B information here* ]
[ *C information here*
*C additional information here*
*C additional information here*
[ *C secondary information here*
*C additional secondary information*
[ *C tertiary information* ]
]
[ *C secondary information*
]
]
[ *D information here* ]
有没有办法将数据作为列表列表输出:
>>>for n in mylist:
>>> print n
[*A information here*]
[*B information here*]
[*C information here* *C additional information here* [*C secondary information here* *C additional secondary information* [*C tertiary information*]] [*C secondary information*]]
[*D information here*]
答案 0 :(得分:0)
假设文件不包含任何括号,您可以替换" BEGIN"和"结束"像你一样使用括号,然后写一个递归函数来解析它:
def parse(text):
j=0
result = [""] # initialize a list to store the result
for i in range(len(text)): # iterate over indices of characters
if text[i] == "[":
s = "" # initialize a string to store the text
nestlevel = 1 # initialize a variable to store number of nested blocks
j = i
while nestlevel != 0: # loop until outside all nested blocks
j+=1
# increment or decrement nest level on encountering brackets
if text[j]=="[":
nestlevel+=1
if text[j]=="]":
nestlevel-=1
# data block goes from index i+1 to index j-1
result.append(parse(text[i+1:j])) # slicing doesn't include end bound element
result.append("")
elif i>j:
result[-1]=result[-1]+text[i]
return result
with open(filepath) as f:
data=parse(f.read().replace("BEGIN","[").replace("END","]"))
这只是一个粗略的想法,我确信它可以通过其他方式进行优化和改进。此外,它可能返回空字符串,其中子列表之间没有文本。
答案 1 :(得分:0)
我已设法使用以下代码:
def getObjectData(filepath):
with open(filepath) as infile:
mylist = []
linenum = 0
varcount = 0
varlinedic = {}
for line in infile:
line = line.replace('BEGIN', '[').replace('END', ']')
linenum += 1
if line.startswith('['):
varcount += 1
varlinedic[varcount] = linenum
mylist.append(line.strip())
for key in varlinedic:
if key == varlinedic[key]:
print mylist[varlinedic[key]-1:varlinedic[key]]
else:
print mylist[varlinedic[key-1]:varlinedic[key]]
print getObjectData(filepath)
它返回:
['[ *A information here* ]']
['[ *B information here* ]']
['[ *C information here*', '*C additional information here*', '*C additional information here*', '[ *C secondary information here*', '*C additional secondary information*', '[ *C tertiary information* ]', ']', '[ *C secondary information*', ']', ']']
['[ *D information here* ]']
None