在文本文件中两个子字符串的每次出现之间提取文本

时间:2019-07-01 15:01:43

标签: python python-2.7 text-processing

下面是一些来自日志文件的示例文本。我需要提取每次发生的“上载事件”和发生的下一个“}”之间的所有文本。我还添加了我需要返回的示例(请注意,这只是一个示例-我将把该方法应用于更一般的情况)。另外,我输出的格式不是很好,只是一个想法。接近就可以了,我可以从那里进行格式化,内容是最重要的:

输入:

2019-06-28 15:02:09:918 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Activate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:920 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventIdleSleep, preventSuspendOnSleep (assertion 0x11ff1e710 added: preventIdleSleep; removed: (none))
2019-06-28 15:02:09:921 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10108]
2019-06-28 15:02:09:921 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Creating PowerAssertion on abc-rrre:365
2019-06-28 15:02:09:922 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Sleep revert state: 1
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Created SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Created PowerAssertion on abc-rrre:365, sleep reverted
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Client relinquished <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:927 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Deactivate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:928 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventSuspendOnSleep (assertion 0x11ff1e710 added: (none); removed: preventIdleSleep)
2019-06-28 15:02:09:929 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10100]
2019-06-28 15:02:09:929 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Releasing PowerAssertion on abc-rrre:365 from update
2019-06-28 15:02:09:930 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Remove assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:931 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Released SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:932 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: -[BKAssertion dealloc] - <0x11ff1e710>
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = 1234567890;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “l323f123f”;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = NA;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = source;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “lasdf23f23”;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:936 - info: [bUSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAdditional : {
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add1 = value;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add2 = false;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     “tsn” = “g254g34gg4g”;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     "time_zone" = EDT;
2019-06-28 15:02:09:938 - info: [bUSLog] [IOS_SYSLOG_ROW] }

输出:

ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = 1234567890; pop = abc; origin = target; "tsn" = “l323f123f”;}
ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = NA; pop = abc; origin = source; "tsn" = “lasdf23f23”;}
ABCAdditional : { add1 = value; add2 = false; pop = abc; origin = target’;  “tsn” = “g254g34gg4g”; "time_zone" = EDT;}"

我尝试使用:

   start = ‘Event uploaded, ’
   end = ‘}’
   new = entry[entry.find(start)+len(start):entry.rfind(end)]

和其他几种方法(包括正则表达式),但没有运气... 任何帮助将不胜感激,谢谢!

编辑(尝试):

with open(target_logs) as log:
do_print = False
event_key = 'Event uploaded,'

for line in log:
    line = line.strip()
    if do_print:
        sys.stdout.write(line[line.rfind(']') + 1:].strip())
    if event_key in line:
        do_print = True
        sys.stdout.write(line[line.find(event_key) + len(event_key):].strip())
    elif line.endswith('}'):
        do_print = False
        print()

接听:

2019-06-28 15:02:11:672 - info: [bUSLog] [BUS_SYSLOG_ROW] Jun 28 11:02:11 device--Target sharingd(WirelessProximity)[57] <Notice>: Nearby start scanning with data: scan request of type 16, blob: <>, mask <>, active: 0, duplicates: 0, screen on: 300, screen off: 300, rssi: -60, peers: (
2019-06-28 15:02:11:672 - info: [bUSLog] [BUS_SYSLOG_ROW]     "1A02F1A8-5597-4B1F-8802-BA022F789F81",
2019-06-28 15:02:11:673 - info: [bUSLog] [BUS_SYSLOG_ROW]     "A80A3D54-F8F2-D96B-598B-3EF0AE3ABC70",
2019-06-28 15:02:11:673 - info: [bUSLog] [BUS_SYSLOG_ROW]     "B4F0AC04-4A06-92EB-AA85-32002E6675BC",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "D9A5686A-C971-ADEB-A33F-2C772F351D45",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "5B66FA21-AA48-66D8-A619-1C0EA9190597",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "C540AC68-57DF-DA13-3C73-1129E2DD5A6D",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "CCD3C7C8-5069-C9C7-D4B5-FCAD9C2FA15F",
2019-06-28 15:02:11:675 - info: [bUSLog] [BUS_SYSLOG_ROW]     "E6A1699E-91BC-AEB1-DE99-C7C0FB440FAA",
2019-06-28 15:02:11:675 - info: [bUSLog] [BUS_SYSLOG_ROW]     "01480FF0-CD8D-C505-524D-CC139711A730"

2 个答案:

答案 0 :(得分:2)

第一步,我们进行替换(regex101),然后在}\n之后拆分并删除换行符:

data = '''2019-06-28 15:02:09:918 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Activate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:920 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventIdleSleep, preventSuspendOnSleep (assertion 0x11ff1e710 added: preventIdleSleep; removed: (none))
2019-06-28 15:02:09:921 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10108]
2019-06-28 15:02:09:921 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Creating PowerAssertion on abc-rrre:365
2019-06-28 15:02:09:922 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Sleep revert state: 1
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Created SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Created PowerAssertion on abc-rrre:365, sleep reverted
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Client relinquished <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:927 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Deactivate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:928 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventSuspendOnSleep (assertion 0x11ff1e710 added: (none); removed: preventIdleSleep)
2019-06-28 15:02:09:929 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10100]
2019-06-28 15:02:09:929 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Releasing PowerAssertion on abc-rrre:365 from update
2019-06-28 15:02:09:930 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Remove assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:931 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Released SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:932 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: -[BKAssertion dealloc] - <0x11ff1e710>
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = 1234567890;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “l323f123f”;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = NA;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = source;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “lasdf23f23”;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:936 - info: [bUSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAdditional : {
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add1 = value;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add2 = false;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     “tsn” = “g254g34gg4g”;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     "time_zone" = EDT;
2019-06-28 15:02:09:938 - info: [bUSLog] [IOS_SYSLOG_ROW] }'''

import re

data = re.sub(r'^.*SYSLOG_ROW\]\s*(?:[A-Z].+?(?=Event uploaded,|$))?', r'', data, flags=re.M)
data = re.sub(r'^"[^"]+",?$', r'', data, flags=re.M)
for row in [v.replace('\n', '').lstrip('Event uploaded,') for v in re.split(r'(?<=})\n', data)]:
    print(row)

打印:

ABCAccount : {dcis = 0;ttl = 0;bb = 0;r1 = 1234567890;pop = abc;origin = target;"tsn" = “l323f123f”;}
ABCAccount : {dcis = 0;ttl = 0;bb = 0;r1 = NA;pop = abc;origin = source;"tsn" = “lasdf23f23”;}
ABCAdditional : {add1 = value;add2 = false;pop = abc;origin = target;“tsn” = “g254g34gg4g”;"time_zone" = EDT;}

编辑(从文件中读取):

import re

with open('log.txt', 'r') as f_in:
    data = f_in.read()

data = re.sub(r'^.*SYSLOG_ROW\]\s*(?:[A-Z].+?(?=Event uploaded,|$))?', r'', data, flags=re.M)
data = re.sub(r'^"[^"]+",?$', r'', data, flags=re.M)
for row in [v.replace('\n', '').lstrip('Event uploaded,') for v in re.split(r'(?<=})\n', data)]:
    print(row)

答案 1 :(得分:1)

迭代方法(对于 python 3.x ):

with open('log.txt') as log:
    do_print = False
    event_key = 'Event uploaded,'  # starting marker

    for line in log:
        line = line.strip()
        if do_print: print(line[line.rfind(']') + 1:].strip(), end=' ')
        if event_key in line:
            do_print = True
            print(line[line.find(event_key) + len(event_key):].strip(), end=' ')
        elif line.endswith('}'):
            do_print = False
            print()

输出:

ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = 1234567890; pop = abc; origin = target; "tsn" = “l323f123f”; } 
ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = NA; pop = abc; origin = source; "tsn" = “lasdf23f23”; } 
ABCAdditional : { add1 = value; add2 = false; pop = abc; origin = target; “tsn” = “g254g34gg4g”; "time_zone" = EDT; } 

对于较低的 python 版本,可以使用sys.stdout.write方法代替print(..., end=' ')