我正准备一些Whatsapp聊天记录来渲染统计数据和wordclouds。但是我的数据时不时会出现双重换行符,这会混淆日志的格式,我想知道如何自动修复。
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
搜索并删除空行(简单修复)。但是,我仍然留下了打破日期和时间格式的行:
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
目标格式:
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
也许解决方案正在利用这条规则:我需要保留的换行符遵循模式:
TEXT *linebreak*
NUMBER(begging of date column)
麻烦的人遵循模式:
TEXT *linebreak*
TEXT
我怎样才能使用Notepad ++修复它?
答案 0 :(得分:1)
在搜索和替换对话框中,您可以搜索此模式
public interface FileProcess{
public void process();
}
public class TextProcess implements FileProcess{
public void process(){System.out.print("Im Text file")};
}
public class VideoProcess implements FileProcess{
public void process(){System.out.print("Im Video file")};
}
public class AudioProcess implements FileProcess{
public void process(){System.out.print("Im Audio file")};
}
启用正则表达式并替换为空。
\r\n(?!\d)
搜索由CR和LF组成的换行符。在Notepad ++中启用控制字符的显示,以查看您有哪些换行符。
\r\n
是negative lookahead断言,当没有数字跟随时,这是正确的。这适用于您的示例,但对于某些极端情况可能会失败,您可以将其扩展为模式,例如(?!\d)
当日期总是两位数时。