如何从.txt文件中提取某些关键字后出现的某些数据

时间:2015-10-29 09:10:44

标签: python

我的实验结果存储在.txt文件中。以下是output.txt

的示例
Initializing the time of all nodes on network 10.0.0.0 to: 0.0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Run Experiment:

Start LTG Traffice  AP -> STA
START TIME:2015-10-28 09:17:55.460000

Stop  LTG - AP -> STA
 ....Removing LTG+Saving and Writing Logs file after 0.3 s
END TIME:2015-10-28 09:18:25.467000

Log Sizes:  AP  = 14,155,896 bytes
            STA = 26,162,648 bytes

Initializing the time of all nodes on network 10.0.0.0 to: 0.0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Run Experiment:

Start LTG Traffice  AP -> STA
START TIME:2015-10-28 09:20:32.499000

Stop  LTG - AP -> STA
 ....Removing LTG+Saving and Writing Logs file after 0.3 s
END TIME:2015-10-28 09:21:02.505000

Log Sizes:  AP  = 14,152,304 bytes
            STA = 26,163,856 bytes
Initializing the time of all nodes on network 10.0.0.0 to: 0.0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Run Experiment:

Start LTG Traffice  AP -> STA
START TIME:2015-10-28 09:23:09.512000

Stop  LTG - AP -> STA
 ....Removing LTG+Saving and Writing Logs file after 0.3 s
END TIME:2015-10-28 09:23:39.518000

Log Sizes:  AP  = 12,144,180 bytes
            STA = 22,720,608 bytes

在每次实验之后,在output.txt中输出以下python脚本输出,其中始终包含以下信息:

Initializing the time of all nodes on network 10.0.0.0 to: 0.0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Run Experiment:

Start LTG Traffice  AP -> STA
START TIME:2015-10-28 09:23:09.512000

Stop  LTG - AP -> STA
 ....Removing LTG+Saving and Writing Logs file after 0.3 s
END TIME:2015-10-28 09:23:39.518000

Log Sizes:  AP  = 12,144,180 bytes
            STA = 22,720,608 bytes

如何提取START TIME信息并存储在new.txt中,如:

2015-10-28 09:17:55.460000
2015-10-28 09:20:32.499000
2015-10-28 09:23:09.512000

3 个答案:

答案 0 :(得分:2)

我的尝试如下 - 基于正则表达式。

import re

f = open(r"C:\log.txt",'rb')

pattern = re.findall(r'(?<=START TIME:)([0-9-:\s.]+)$',f.read(),re.M|re.I)
for i in pattern:
    print i.strip()

打印

2015-10-28 09:17:55.460000
2015-10-28 09:20:32.499000
2015-10-28 09:23:09.512000

正则表达式的说明

(?&lt; = START TIME :)([0-9 - :\ s。] +)$

(?<=START TIME:)([0-9-:\s.]+)$

选项:不区分大小写;确切间距;点不匹配换行符; ^ $匹配换行符处;只有正则表达式语法

  • 断言下面的正则表达式可以匹配,匹配结束于此位置(正向后看)(?<=START TIME:)
    • 字面匹配字符串“START TIME:”(不区分大小写)START TIME:
  • 匹配下面的正则表达式并将其匹配捕获到反向引用号1 ([0-9-:\s.]+)
    • 匹配以下列表中的单个字符[0-9-:\s.]+
      • 在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)+
      • “0”和“9”0-9
      • 之间范围内的字符
      • 列表中的单个字符“ - :”-:
      • “空格字符”(任何Unicode分隔符,制表符,换行符,回车符,垂直制表符,换页符,下一行)\s
      • 字面字符“。”.
  • 在行尾(位于字符串末尾或换行符之前)断言位置(换行符)$

修改

之下获得两次小数点击
import re

f = open(r"C:\Users\Winrock\Desktop\log.txt",'rb')

pattern = re.findall(r'(?<=START TIME:)([0-9-:\s.]+)$',f.read(),re.M|re.I)
for i in pattern:
    data = i.strip()
    print data [0:len(data)-4]

答案 1 :(得分:1)

此代码可以为您提供结果:

fout = open('new.txt', 'w')
for line in open('output.txt', 'r'):
    if line.startswith('START TIME'):
        fout.write("%s\n" % line.split('START TIME:')[1].strip())

答案 2 :(得分:0)

我喜欢@SIslam的答案。

以下是使用str.partition的替代实现:

<强> extractstarttime.py:

with open('experiment.log', 'rU') as efile:
    with open('starttimes.log', 'a') as sfile:
        for line in efile:
            if line.startswith('START TIME:'):
                starttime = line.partition(':')[2].strip()
                sfile.write(starttime+'\n')

<强>输出:

$ python extractstarttime.py
$ cat starttimes.log
2015-10-28 09:17:55.460000
2015-10-28 09:20:32.499000
2015-10-28 09:23:09.512000