Question

对于文本文件：

[2018-07-11 20:57:08] SYSTEM RESPONSE: "hello"
[2018-07-11 20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"
[2018-07-11 20:57:19] SYSTEM RESPONSE: "It's going pretty good. 
How about you?
What's good?
Up to anything new?
After a long time"

[2018-07-12 14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!    
Thank you for asking.
Nothing is new so far. 
Just working on some projects.
[2018-07-12 20:57:19] SYSTEM RESPONSE: Great!

我希望我的输出看起来像：

    [2018-07-11 20:57:08] SYSTEM RESPONSE: "hello"
    [2018-07-11 20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"


[2018-07-11 20:57:19] SYSTEM RESPONSE: "It's going pretty good. How about you?| What's good? Up to anything new?| After a long time"

    [2018-07-12 14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!    |Thank you for asking. | Nothing is new so far. | Just working on some projects.
    [2018-07-12 20:57:19] SYSTEM RESPONSE: Great!

基本上，所有不以时间戳开头的行都转到上一行。到目前为止，我已经尝试过：

 a , b = text_from_index.split(",") # so I get the file name and the date from this 
            with open("/home/Desktop/"+ a) as log_fd:
                file = log_fd.readlines()

                x =""

                for line in file:
                    if b in line: # b here is the date. eg- 2018-07-11
                        x = x + "//" + line[11:]
                    else:
                        x=x        
                x= x.replace("//","<br /> \n")
                x= x.replace("]","|")

                x= re.sub(r'\(.+?\)', '', x)

到目前为止，我只能通过搜索日期来获取行。任何建议，将有帮助！谢谢！请随时问我任何问题或进一步说明

Answer 1

将当前行存储在一个变量中，例如cur_line。如果下一行不是以cur_line开头，请将[写入新文件，否则将行追加到cur_line

with open('tmp.txt') as in_file, open('out.txt', 'w') as out_file:
    lines = in_file.readlines()
    cur_line = ''
    for l in lines:
        l = l.rstrip('\r\n')
        if not l:
            continue
        if l[0] == '[':
            out_file.write(cur_line +'\n')
            cur_line = l
        else:
            cur_line += l
    out_file.write(cur_line +'\n')

Answer 2

您可以使用正则表达式来执行此操作。下面的正则表达式与您的时间戳完全匹配。

import re
pattern = re.compile("\[(\d){4}\-(\d){2}\-(\d){2}\s(\d){2}:(\d){2}:(\d){2}\]")

# will match with your timestamp so you can skip these lines and concatenate others
pattern.match(line)

完整的解决方案如下所示：

import re
pattern = re.compile("\[(\d){4}\-(\d){2}\-(\d){2}\s(\d){2}:(\d){2}:(\d){2}\]")

with open("test.txt") as log_fd:
file = log_fd.readlines()

x =""
last = False

for line in file:
    if not line in ['\n', '\r\n']:
        if pattern.match(line):
            if last:
                x = x + '\n' + line.strip('\r\n')
            else:
                x = x + '\n' + line.strip('\r\n')
        else:
            x = x + ' | ' + line.strip('\r\n')
        last = pattern.match(line)

print(x)

它在字符串的开头将有一个空行，但是用您的字符串进行求解并仅打印出结果。绝对不是最优雅的。

将所有不以char开头的行添加到上一行

2 个答案: