我有一个像这样设置的文件:
; start item 1
; item 1 line ; start item 2; start item 3
; item 1 line ; item 2 line ; item 3 line ; start item 4
; item 1 line ; item 2 line ; item 3 line ; item 4 line ; start item 5
; item 1 line ; item 2 line
; item 1 line
; item 1 line
; item 6 start
; item 6 line ; item 7 start
; item 6 line ; item 7 line ; item 8 start
; item 6 line ;; item 8 line
; item 6 line
; item 6 line ; item 9 start
; item 6 line ; item 9 line
; item 6 line
; item 6 line ; item 0 start
; item 6 line ; item 0 line
;; item 0 line
;; item 0 line
(想象一下,这些列是不同的人,行就是他们所说的 - 有几行的行是几个人同时说话。)
我正在尝试解析这个,所以我可以分别得到每个项目,但我只是部分成功。这是我的方法:
def unpacker(File):
Values = {}
main_key = 0
sep = ';'
with open(File)as fn:
for line in fn:
if line.count(sep):
for i, sub_line in enumerate(line.split(sep)):
sub_key=str(main_key)+'_'+str(i)
sub_line=sub_line.replace('\n','')
if Values.get(sub_key):
Values[sub_key]+=('|'+sub_line)
else:
Values[sub_key]=sub_line
else:main_key+=1
for k in Values.keys():
print k, '---------'
print Values[k]
其输出带有示例数据:
1_3 ---------
item 8 start| item 8 line
1_2 ---------
item 7 start| item 7 line || item 9 start| item 9 line| item 0 start| item 0 line| item 0 line| item 0 line
1_1 ---------
item 6 start| item 6 line | item 6 line | item 6 line | item 6 line| item 6 line | item 6 line | item 6 line| item 6 line | item 6 line ||
1_0 ---------
0_4 ---------
start item 4| item 4 line
0_5 ---------
start item 5
0_2 ---------
start item 2| item 2 line | item 2 line | item 2 line
0_3 ---------
start item 3| item 3 line | item 3 line
0_0 ---------
0_1 ---------
start item 1| item 1 line | item 1 line | item 1 line | item 1 line | item 1 line | item 1 line
如果每个项目中尚未包含其自己的键,则会在其中分配。每行中的行长度可能不同,但分号将始终采用该模式。
此方法适用于上述示例的第一部分(第1至第5项),但未能在后半部分(第6项以后)将项目7,9和0分开。如果7,9和0相关,该方法将起作用,但它们不相关。我在这一点上已经陷入困境,如何区分这些项目。
答案 0 :(得分:1)
这是一个代码,用于处理您的示例。您可能已经根据实际用例进行了调整:
class Speaker(list):
def __init__(self):
list.__init__(self)
self.activated = True
def talk(self, string):
if self.activated :
talk = string.replace("\n", "")
if talk :
self.append(talk)
else:
self.activated = False
return self.activated
class SpeakerIndex(dict):
def __init__(self, filepath, separator):
""" Creation of index """
dict.__init__(self)
self.separator = separator
self.talk = 0
self.toSpeak = []
self.hadSpeak = []
with open(filepath, 'r') as data:
for line in data:
##print("line: ",line)
##print("toSpeak: ",self.toSpeak)
self.speakersFeed(line)
#save and remove person tha should have speak
for speaker in self.toSpeak:
self.save_speaker(speaker)
self.toSpeak = self.hadSpeak
self.hadSpeak = []
def speakersFeed(self, line):
""" parse a line """
if self.separator in line:
for speaker_action in line.split(self.separator)[1:]:
##print("action :",speaker_action)
speaker = None
#Take the good speaker
if self.toSpeak:
speaker = self.toSpeak.pop(0)
else:
speaker = Speaker()
#process the content
result = speaker.talk(speaker_action)
##print("speaker : ",speaker)
#put the speaker where is needed depending of its state
if result :
self.hadSpeak.append(speaker)
else:
self.save_speaker(speaker)
else:
#save speaker that may be not ended at this point
for speaker in self.toSpeak:
self.save_speaker(speaker)
self.talk +=1
def speaker_id(self, speaker):
""" Return an unique Id for speakers """
number = int(speaker[0].split(" ")[2])
return "talk{0}-speaker{1}".format(self.talk, number)
def save_speaker(self, speaker):
self[self.speaker_id(speaker)]=speaker
##print("saved :",speaker)
def __str__(self):
""" override the str() comportment """
keylist = list(self.keys())
keylist.sort()
result = "{\n"
for key in keylist:
result += "\t" + str(key) + " : " + str(self[key]) + "\n"
result += "}"
return result
if __name__ == "__main__":
index = SpeakerIndex("foo.txt", ";")
print(str(index))
您可以取消注释打印行以获取执行跟踪。这些课程背后的想法是随时保持一堆发言者。
执行给我这个:
python3 ./sof.py
{
talk0-speaker1 : [' item 1 start', ' item 1 line ', ' item 1 line ', ' item 1 line ', ' item 1 line ', ' item 1 line ', ' item 1 line']
talk0-speaker2 : [' item 2 start ', ' item 2 line ', ' item 2 line ', ' item 2 line']
talk0-speaker3 : [' item 3 start', ' item 3 line ', ' item 3 line ']
talk0-speaker4 : [' item 4 start', ' item 4 line ']
talk0-speaker5 : [' item 5 start']
talk1-speaker0 : [' item 0 start', ' item 0 line', ' item 0 line']
talk1-speaker1 : [' item 1 start', ' item 1 line ', ' item 1 line ', ' item 1 line ', ' item 1 line ', ' item 1 line ', ' item 1 line']
talk1-speaker6 : [' item 6 start', ' item 6 line ', ' item 6 line ', ' item 6 line ', ' item 6 line', ' item 6 line ', ' item 6 line ', ' item 6 line', ' item 6 line ', ' item 6 line ']
talk1-speaker7 : [' item 7 start', ' item 7 line ']
talk1-speaker8 : [' item 8 start', ' item 8 line']
talk1-speaker9 : [' item 9 start', ' item 9 line']
}