如果要删除外部线：

Question

我们说我有这个档案：

1
17:02,111
Problem report related to
router

2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk
now due to compromised data

我想要这个输出：

1
17:02,111
Problem report related to router

2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk now due to compromised data

尝试使用bash并获得了一种紧密的解决方案，但我不知道如何在Python上实现这一点。

提前谢谢

Answer 1

如果要删除外部线：

为了达到这个目的，你可以检查两个条件，如果一行不是一个空的新行，或者行前面应该跟一个与后面的正则表达式^\d{2}:\d{2},\d{3}\s$匹配的行。

因此，为了在每次迭代中访问下一行，您可以使用itertools.tee从名为temp的主文件对象创建一个文件对象，并在其上应用next函数。并使用re.match来匹配正则表达式。

from itertools import tee
import re
with open('ex.txt') as f,open('new.txt','w') as out:
    temp,f=tee(f)
    next(temp)
    try:
        for line in f:
            if next(temp) !='\n' or re.match(r'^\d{2}:\d{2},\d{3}\s$',pre):
                out.write(line)
            pre=line
    except :
        pass

结果：

1
17:02,111
Problem report related to

2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk

如果要将其余部分连接到第三行：

如果要在第三行到第三行之后连接其余行，可以使用以下正则表达式查找\n\n后面的所有块或文件末尾（$）：

r"(.*?)(?=\n\n|$)"

然后根据日期格式的行拆分块并将部件写入输出文件，但请注意，您需要用空格替换第3部分中的新行：

ex.txt：

1
17:02,111
Problem report related to
router
another line


2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk
now due to compromised data
line 5
line 6
line 7

演示：

def splitter(s):
    for x in re.finditer(r"(.*?)(?=\n\n|$)", s,re.DOTALL):
          g=x.group(0)
          if g:
            yield g

import re
with open('ex.txt') as f,open('new.txt','w') as out:
    for block in splitter(f.read()):
        first,second,third= re.split(r'(\d{2}:\d{2},\d{3}\n)',block)
        out.write(first+second+third.replace('\n',' '))

结果：

1
17:02,111
Problem report related to router another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data line 5 line 6 line 7

注意：

在这个回答中，splitter函数返回一个生成器，当你处理大文件并拒绝在内存中存储不可用的行时，它非常有效。

Answer 2

当且仅当文件符合您给定的样本

时，此方法才有效

注意：

There may be a faster way if regex is used and it might also be simpler但想以合乎逻辑的方式做到这一点

<强>代码：

inp=open("output.txt","r") inp=inp.read().split("\n") print inp tempString="" output=[] w=0 for s in inp: if s: if any(c.isalpha() for c in s): tempString=tempString+" "+s else: w=0 if tempString: output.append(tempString.strip()) tempString="" output.append(s) else: if tempString: output.append(tempString.strip()) tempString="" output.append(" ") if tempString: output.append(tempString.strip()) print "\n".join(output) out=open("newoutput.txt","w") out.write("\n".join(output)) out.close()

<强>输入：

1 17:02,111 Problem report related to 2 router 2 17:05,223 Restarting the systems 3 18:02,444 Must erase hard disk now due to compromised data 4 17:02,111 Problem report related to router

<强>输出：

1 17:02,111 Problem report related to 2 router 2 17:05,223 Restarting the systems 3 18:02,444 Must erase hard disk now due to compromised data 4 17:02,111 Problem report related to router

Answer 3

x="""1
17:02,111
Problem report related to
router

2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk
now due to compromised data
or something"""
def repl(matchobj):
    ll=matchobj.group().split("\n")
    return "\n".join(ll[:3])+" "+" ".join(ll[3:])
print re.sub(r"\b\d+\n\d+:\d+,\d+\b[\s\S]*?(?=\n{2}|$)",repl,x)

您可以将re.sub与自己的自定义替换功能结合使用。

蟒蛇。在1行加入特定行

3 个答案:

如果要删除外部线：

如果要将其余部分连接到第三行：

注意：