Question

我有一个脚本以这种格式输出我的任务：

Thu Apr 04           Finish Work
                     Walk

Sat Apr 06           Collect NIC

Mon Apr 08           Run test

我想将它拆分成字典，以便我可以进行一些匹配/格式化：

{'Thu Apr 04' : ('Finish Work', 'Walk'),
'Sat Apr 06' : 'Collect NIC',
'Mon Apr 08' : 'Run test'}

我尝试过像split（），replace（）这样的字符串函数，但是我无法获得我想要的格式。

更新＃1

我将脚本的输出分配给变量并使用print repr(output)，它给出了：

'\nThu Apr 04           Finish PTI Video\n                     Weigh In\n\nSat Apr 06           Collect NIC\n\nMon Apr 08           Serum uric acid test\n\n'

Answer 1

你可以试试这个：

a = '\nThu Apr 04           Finish PTI Video\n                     Weigh In\n                         Eat out\n\nSat Apr 06           Collect NIC\n\nMon Apr 08           Serum uric acid        test\n\n'
b = {}
same_day = ''
for x in a.split('\n'):
    c = x.split('           ')
    if c[0] is '':
        for q in c:
            if q is not '':
                b.update({same_day: b[same_day] + ', ' + q.strip()})
                break
    else:
        same_day = c[0]
        b.update({c[0] : c[1]})

它很脏。但是会完成工作。如果输入是文件，那么你可以使用readline来获取x

Answer 2

假设您将原始脚本的输出（即您要解析的示例文本）保存在名为“schedule.txt”的文件中。

import re
with open("schedule.txt") as f:
    lines = f.readlines()
sched = {}
currday = None
for line in lines:
    newday = re.match(r'(\w+\s+\w+\s+\w+)\s+(.*)',line)
    if newday:
        currday = newday.group(1)
        sched[currday] = [newday.group(2)]
    elif currday:
        newact = re.match(r'\s+(.*)',line)
        if newact:
            sched[currday] = newact.group(1)

请注意，这会将条目保存为列表，而不是元组。但是，如果你真的需要元组，你可以在它们上面调用tuple()函数。

日期后分割输出

2 个答案: