使用regex- python3.x获取时间戳

时间:2019-07-17 18:24:21

标签: regex python-3.x timestamp pattern-matching

将所有时间戳与文本文件中存在的其他内容分开。例如:

a.txt

2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart

"2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart

17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
"mgremove datestring"     asfasnfs: remove datepart check the value
                         "mgremove datestring"     asfasnfs: remove datepart check the value

我的解决方案针对文本的前4行执行此操作,但这不是通用的。我想使其通用,以便从行的开头自动检测时间戳。

with open("\a.txt") as f:
    for line in f:
        date_string = " ".join(line.strip().split()[:4])
        print(date_sting, line)

期望的解决方案:

date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line =  asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line =  asfasnfs: remove datepart

文本文件也可能包含其他时间戳模式。有什么方法可以在行的开头检测时间戳并提取时间戳? 如果行首没有日期,则从最后一行开始获取日期。

1 个答案:

答案 0 :(得分:1)

包含 a.txt 的内容:

2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart

"2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart

17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
asfasnfs: remove datepart
                               asfasnfs: remove datepart

此脚本:

def get_date_string(line):
    rv = ''
    words = line.split()
    while words:
        rv += words.pop(0) + ' '
        if len(rv) > 18:
            break
    return rv.strip()

with open('file.txt', 'r') as f_in:
    last_date_string = ''

    for line in f_in:
        line = line.strip()
        if not line:
            continue

        date_part = get_date_string(line)
        if date_part == line:
            print('date string={: <30} line={}'.format(last_date_string, line))
        else:
            print('date string={: <30} line={}'.format(date_part, line))
            last_date_string = date_part

打印:

date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=asfasnfs: remove datepart