我有一个[半]结构化的文本文件,有各种标题。一个特定的标题有多行。文件中的回车和分页符可以包含在所需的标题下。需要将收件人的值添加到列表中。
该文件可能如下所示:
Id
1236547852012
Time
2017-05-01
Author
mary jane (123654789)
Recipients
peter paul (987456789)
jane jackson (74125896)
Id
2017050145698
Time
2017-04-30
Author
jane jackson (74125896)
Recipients
peter paul (987456789)
\n\r
\n\r
janet jackson (74125896)
fran mckensie (85214796)
\n\r
walter wood (745896369)
Id
4569632587
Time
2017-04-29\n\r
Author
mary jane (123654789)
Recipients
peter paul (987456789)
jane jackson (74125896)
我的每条消息的输出都需要一个收件人列表,它需要看起来像这样
[987456789, 74125896]
[987456789, 74125896, 85214796, 745896369]
[987456789, 74125896]
我的代码:
recipientList = []
with open(inputFile, 'rb') as f:
for line in f:
if 'Recipients' in line:
#did this b/c recipient id would be on same line
lineparts = line.split(' ')
if len(lineparts) == 3:
recipient = line.strip()
recipientId = recipient.split('(',1)[1].replace(')','').strip()
recipientList.append(recipientId)
nextRecip = next(f).strip()
if nextRecip:
recipID = nextRecip.split('(',1)[1].replace(')','').strip()
recipientList.append(nextRecip)
anotherRecip = next(f).strip()
if anotherRecip:
recipID2 = anotherRecip.split('(',1)[1].replace(')','').strip()
recipientList.append(recipID2)
if len(lineparts) == 2: #recipient on following line
nxtRecipient = next(f).strip()
if nxtRecipient:
nxtRecipID = nxtRecipient.split('(',1)[1].replace(')','').strip()
recipientList.append(nxtRecipent)
如何在不经常键入next(f)的情况下继续捕获recipientID。我想说明1 - n可能有多少个收件人;以及可能包含以下标题的分页符:
Recipients peter paul (987456789)
jane jackson (74125896)
PAGE 2
walter woods(745896369)
..以及“收件人”列表中收件人之间未知的回车金额。我希望这不会太混乱。