我有一个这样的文本文件:
APPENDIX -- GLOSSARY
-------------------------------------------------------------------
Asymmetrical Encryption:
Encryption using a pair of keys--the first encrypts a
Big-O Notation, Complexity:
Big-O notation is a way of describing the governing.
In noting complexity orders, constants and multipliers are
conventionally omitted, leaving only the dominant factor.
Compexities one often sees are:
#*------------- Common Big-O Complexities ---------------#
O(1) constant
Birthday Paradox:
The name "birthday paradox" comes from the fact--surprising
Cyclic Redundancy Check (CRC32):
See Hash. Based on mod 2 polynomial operations, CRC32 produces a
32-bit "fingerprint" of a set of data.
Idempotent Function:
The property that applying a function to its return value
'G=lambda x:F(F(F(...F(x)...)))'.
我想解析文本文件,使其输出如下:
{'Asymmetrical Encryption': Encryption using a pair of keys--the first encrypts a,
'Big-O Notation, Complexity':'Big-O notation is a way of describing the governing. In noting complexity orders, constants and multipliers are conventionally omitted, leaving only the dominant factor. Compexities one often sees are: #*------------- Common Big-O Complexities ---------------# O(1) constant}', ..so on }
这就是我所做的:
dic = {}
with open('appendix.txt', 'r') as f:
data = f.read()
lines = data.split(':\n\n')
for line in lines:
res = line.split(':\n ')
field = res[0]
val = res[1:]
dic[field] = val
尽管有标题,但这会弄乱文本中的:
值。输出不正确。
答案 0 :(得分:0)
如果要根据第一个空格解析文本,可以使用如下脚本:
class spaceParser(object):
result = {}
last_title = ""
last_content = ""
def process_content(self, content_line):
if self.last_title:
self.last_content = self.last_content + content_line.strip()
self.result[self.last_title] = self.last_content
def process_title(self, content_line):
self.last_title = content_line.strip()
self.last_content = ""
def parse(self, raw_file):
for line in raw_file:
#look for patterns based in tabulation
if line[0:4] == " ":
#content type
self.process_content(line)
elif line[0:2] == " ":
#title type
self.process_title(line)
else:
#other types
pass
#append the last one
self.process_content("")
parser = spaceParser()
with open('appendix.txt', 'r') as raw_file:
parser.parse(raw_file)
print parser.result