在巨大的文本文件中查找并替换多行

时间:2011-04-22 18:59:55

标签: regex logging text replace find

我有一个巨大的日志文件(差不多6 GB)的游戏服务器,除了需要保存的有用记录外,还有数百万个错误(当时每秒造成数百个错误)。我想删除包括错误在内的所有行,同时保留显示聊天消息或其他信息的行。

但是,我不能轻易删除我想要转储的行,因为错误消息并不总是相同,并且总是需要不同的行数。简而言之,我根本无法确定哪些行包含错误。我需要一个正则表达式才能这样做。我一直在寻找一个适合我目的的程序。但是我还没有找到一个。例如,sed(流编辑器)可以做这样的工作,因为它不需要太多资源来处理这么大的文件。但是,它不支持查找和替换多行。

因此,是否有一个程序支持在多行中查找和替换大文本文件中的正则表达式?或者是否建议您编写自己的脚本来完成这项工作?

日志文件如下所示:

2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.BlockButton.a(BlockButton.java:170)
    at net.minecraft.server.ItemInWorldManager.a(ItemInWorldManager.java:160)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:482)
    at net.minecraft.server.Packet15Place.a(SourceFile:57)
    at net.minecraft.server.NetworkManager.a(SourceFile:230)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:75)
    at net.minecraft.server.NetworkListenThread.a(SourceFile:100)
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:357)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:01 [INFO] <admin> Is it working yet? 
2011-03-02 01:43:01 [INFO] <admin> Not really. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:348)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible. 

期望的结果如下:

2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors.
2011-03-02 01:43:01 [INFO] <admin> Is it working yet? 
2011-03-02 01:43:01 [INFO] <admin> Not really. 
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible.

如您所见,日志文件一遍又一遍地包含相同的错误。即使它始终以日期和时间开始,然后是 [SEVERE]无法将事件REDSTONE_CHANGE传递给CraftBookMechanisms ,并以在net.minecraft.server.ThreadServerApplication.run(SourceFile:366)结束,每次之间的错误信息都不同。这就是我不能用空字符串替换错误消息的原因。

是否有正则表达式可以帮助我摆脱包含错误的所有行但保留剩余的行?这样,我的日志文件将缩小到50 MB以下,因为之前所有这些错误都是由于我的服务器因插件损坏而引起的。

2 个答案:

答案 0 :(得分:2)

这个Python脚本通过从stdin读取的日志文件进行一次传递,将过滤后的日志消息打印到stdout。

它使用正则表达式来匹配标记日志消息开头的行(例如以2011-03-02 01:43:00 [开头的行)。

如果开始日志消息的行包含[SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms,则该脚本会丢弃该行与包含下一条日志消息开头的行之间的所有行。否则,它输出该行。您可以将此视为具有两种状态的有限状态机,它们对应于脚本是跳过线还是输出线。

import sys
import re

START_OF_MESSAGE_RE = r"^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"
ERROR_RE = START_OF_MESSAGE_RE + r' \[SEVERE\] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms$'
skip_until_next_message = False

for line in sys.stdin:
    line = line.rstrip()
    if re.match(START_OF_MESSAGE_RE, line):
        if re.match(ERROR_RE, line):
            skip_until_next_message = True
        else:
            skip_until_next_message = False
    if not skip_until_next_message:
        print line

我在日志文件中添加了一些特殊情况以进行测试。这是我测试过的日志文件:

2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.BlockButton.a(BlockButton.java:170)
    at net.minecraft.server.ItemInWorldManager.a(ItemInWorldManager.java:160)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:482)
    at net.minecraft.server.Packet15Place.a(SourceFile:57)
    at net.minecraft.server.NetworkManager.a(SourceFile:230)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:75)
    at net.minecraft.server.NetworkListenThread.a(SourceFile:100)
    [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:357)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:01 [INFO] <admin> Is it working yet? 
2011-03-02 01:43:01 [INFO] <admin> Not really. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:348)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible. 
2011-03-02 01:43:01 [SEVERE] Another multi
line
log
message
2011-03-02 01:43:01 [INFO] <admin> Here's the error: [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms

这是输出:

$ python minecraftlog.py < minecraft.log 
2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors.
2011-03-02 01:43:01 [INFO] <admin> Is it working yet?
2011-03-02 01:43:01 [INFO] <admin> Not really.
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible.
2011-03-02 01:43:01 [SEVERE] Another multi
line
log
message
2011-03-02 01:43:01 [INFO] <admin> Here's the error: [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms

答案 1 :(得分:0)

似乎更好的方法是匹配你想要保留的行,间接“删除”你不关心的行:

以下Perl脚本应该足够了:

while (<>) {
  next unless /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s\[INFO\]/;
  print;
}