我正在寻找一些帮助编写一些Perl代码来对日志文件进行排序。
我是编码和perl的相对新手!
我需要尽可能地使用核心perl模块编写我的代码,但如果事实证明这是不可能的,那么我对CPAN模块开放。日志文件包含记录消息的列表,需要按顺序重新排列。应该很简单,但有很多陷阱,这使我在如何设计我的数据结构方面遇到麻烦。输入文件格式为CSV,输出需要与时间戳顺序中的消息相同,并且连接的消息首先与第一个消息部分组合在一起。
陷阱
如果我只是输入一些示例输入数据,然后它是如何出来的话,那可能是最好的。
输入数据
#message uniqueID,From,To,Time,flag,content,IP,concatenation info
1,"+1231231234","+15125562100","7 Sep 2012 22:08:33","","abcdefghijklmnopqrstuvwxyz",,
2,"+1231231234","+15125562100","7 Sep 2012 22:08:37","","abcdefghijklmnopqrstuvwxyz",,
3,"+1231231234","+15125562100","7 Sep 2012 22:08:41","","abcdefghijklmnopqrstuvwxyz",,
4,"+8888888888","+15125562100","7 Sep 2012 22:09:01","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
5,"+8888888888","+15125562100","7 Sep 2012 22:09:04","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
6,"+8888888888","+15125562100","7 Sep 2012 22:09:05","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
7,"+8888888888","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
8,"+8888888888","+15125562100","7 Sep 2012 22:09:07",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
10,"+1231231234","+15125562100","7 Sep 2012 22:09:46","","abcdefghijklmnopqrstuvwxyz",,
11,"+1231231234","+15125562100","7 Sep 2012 22:09:50","","abcdefghijklmnopqrstuvwxyz",,
12,"+1231231234","+15125562100","7 Sep 2012 22:09:55","","abcdefghijklmnopqrstuvwxyz",,
13,"+8888888888","+15125562100","13 Sep 2012 22:10:36","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
14,"+8888888888","+15125562100","13 Sep 2012 22:10:38","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
15,"+8888888888","+15125562100","13 Sep 2012 22:10:39","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
16,"+8888888889","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
17,"+8888888889","+15125562100","7 Sep 2012 22:10:42",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
18,"+8888888889","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIDAw== (part 3 of 3 of message reference 2)"
19,"+1231231234","+15125562100","13 Sep 2012 20:12:52","","Deposit SMS with readreceiptrequest = false #0",,
20,"+1231231234","+15125562100","13 Sep 2012 20:12:53","","Deposit SMS with readreceiptrequest = false #1",,
21,"+1231231234","+15125562100","13 Sep 2012 20:12:54","","Deposit SMS with readreceiptrequest = false #2",,
22,"+8888888888","+15125562100","13 Sep 2012 20:12:55","","Deposit SMS with readreceiptrequest = false #0: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAAMB (part 1 of 3 of message reference 0)"
23,"+8888888888","+15125562100","13 Sep 2012 20:12:57","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAAMC (part 2 of 3 of message reference 0)"
24,"+8888888888","+15125562100","13 Sep 2012 20:12:58","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAAMD (part 3 of 3 of message reference 0)"
25,"+8888888888","+15125562100","7 Sep 2012 22:10:40","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
26,"+8888888888","+15125562100","7 Sep 2012 22:10:42","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
27,"+8888888888","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIEAw== (part 2 of 2 of message reference 3)"
28,"+8888888888","+15125562100","13 Sep 2012 20:13:02","","Deposit SMS with readreceiptrequest = false #2: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAgMB (part 1 of 3 of message reference 2)"
29,"+8888888888","+15125562100","13 Sep 2012 20:13:03","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAgMC (part 2 of 3 of message reference 2)"
30,"+8888888888","+15125562100","13 Sep 2012 20:13:04","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAgMD (part 3 of 3 of message reference 2)"
31,"+1231231234","+15125562100","13 Sep 2012 20:13:08","","Deposit SMS with readreceiptrequest = true #0",
输出数据
#message uniqueID,From,To,Time,flag,content,IP,concatenation info
1,"+1231231234","+15125562100","7 Sep 2012 22:08:33","","abcdefghijklmnopqrstuvwxyz",,
2,"+1231231234","+15125562100","7 Sep 2012 22:08:37","","abcdefghijklmnopqrstuvwxyz",,
3,"+1231231234","+15125562100","7 Sep 2012 22:08:41","","abcdefghijklmnopqrstuvwxyz",,
4,"+8888888888","+15125562100","7 Sep 2012 22:09:01","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
5,"+8888888888","+15125562100","7 Sep 2012 22:09:04","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
6,"+8888888888","+15125562100","7 Sep 2012 22:09:05","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
16,"+8888888889","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
17,"+8888888889","+15125562100","7 Sep 2012 22:10:42",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
18,"+8888888889","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIDAw== (part 3 of 3 of message reference 2)"
7,"+8888888888","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
8,"+8888888888","+15125562100","7 Sep 2012 22:09:07",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
10,"+1231231234","+15125562100","7 Sep 2012 22:09:46","","abcdefghijklmnopqrstuvwxyz",,
11,"+1231231234","+15125562100","7 Sep 2012 22:09:50","","abcdefghijklmnopqrstuvwxyz",,
12,"+1231231234","+15125562100","7 Sep 2012 22:09:55","","abcdefghijklmnopqrstuvwxyz",,
25,"+8888888888","+15125562100","7 Sep 2012 22:10:40","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
26,"+8888888888","+15125562100","7 Sep 2012 22:10:42","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
27,"+8888888888","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIEAw== (part 2 of 2 of message reference 3)"
19,"+1231231234","+15125562100","13 Sep 2012 20:12:52","","Deposit SMS with readreceiptrequest = false #0",,
20,"+1231231234","+15125562100","13 Sep 2012 20:12:53","","Deposit SMS with readreceiptrequest = false #1",,
21,"+1231231234","+15125562100","13 Sep 2012 20:12:54","","Deposit SMS with readreceiptrequest = false #2",,
22,"+8888888888","+15125562100","13 Sep 2012 20:12:55","","Deposit SMS with readreceiptrequest = false #0: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAAMB (part 1 of 3 of message reference 0)"
23,"+8888888888","+15125562100","13 Sep 2012 20:12:57","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAAMC (part 2 of 3 of message reference 0)"
24,"+8888888888","+15125562100","13 Sep 2012 20:12:58","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAAMD (part 3 of 3 of message reference 0)"
28,"+8888888888","+15125562100","13 Sep 2012 20:13:02","","Deposit SMS with readreceiptrequest = false #2: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAgMB (part 1 of 3 of message reference 2)"
29,"+8888888888","+15125562100","13 Sep 2012 20:13:03","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAgMC (part 2 of 3 of message reference 2)"
30,"+8888888888","+15125562100","13 Sep 2012 20:13:04","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAgMD (part 3 of 3 of message reference 2)"
31,"+1231231234","+15125562100","13 Sep 2012 20:13:08","","Deposit SMS with readreceiptrequest = true #0",
13,"+8888888888","+15125562100","13 Sep 2012 22:10:36","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
14,"+8888888888","+15125562100","13 Sep 2012 22:10:38","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
15,"+8888888888","+15125562100","13 Sep 2012 22:10:39","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
到目前为止,我所做的事情是
现在我陷入困境,想出了有效过滤和排序数据的最佳方法。我已尝试使用哈希并首先将文件加载到内存中,以便我可以对特定的消息引用进行排序,但我不确定它是否适用于大文件。
然后我考虑逐行阅读它,但我可能遇到第二行包含连接SMS的第一部分的问题,我们可能直到文件的最后才会得到后续部分,所以我想也许这也不是一个好主意。
我还想过一个数据库,但我认为在需要运行的系统上进行设置太复杂了。另一种选择是编写包并将复杂结构存储为对象?也许我过于复杂化了?我的大脑肯定会变得糊涂!
无论如何,任何想法或指导都会非常感激。
希望以上内容很清楚,但如果您有任何疑问,请与我联系。
谢谢, 将
答案 0 :(得分:2)
如果正确分解,我认为这个问题太复杂了。
在我看来,您的分拣程序将包含以下阶段:
在Perl中排序时,Schwartzian是一种常见模式。它通过提取一次数据而不是每次比较来加速排序索引必须从实际排序的数据中提取的排序。它也可以被描述为decorate-sort-undecorate。
示例:按长度排序字符串。请注意,在这种情况下,天真的实现会更好。
my @words = qw( aaa b cccc );
my @sorted_words =
map { $_->[1] } # flatten
sort { $a->[0] <=> $b->[0] } # sort by first field (length)
map { [ length $_, $_ ] } # decorate: return arrayref with key and data
@words;
print "[@sorted_words]\n"; # prints "[b aaa cccc]"
将这种模式牢记于你的任务
会很好你已经成功了。对于每一行,我们输出一个数组引用或类似的字段:
0: timestamp (in epoch)
1: part no \
2: total parts | these are undef if no concat info is present
3: message reference /
4: The unmodifed line
对于CSV提取,您应该使用Text::CSV
来计算时期,您应该查看DateTime
我们以散列形式定义缓存,其中消息引用为键,组为值。组是一个arrayref作为上面指定的提取格式,但可以包含位置5和向前的其他行(即每个标记的行是一个组)。
对于收到的每个标记行,我们执行以下步骤:
# pseudocode
# this is how I understood your requirements,
# but it may be wrong. The general principle still holds
# (you may need to choose a different key)
IF the line doesn't have part information, THEN
pass it on immediately.
ELSE
IF the hash has an entry for our message reference, THEN
IF the timestamp of the present group is too old, THEN
pass on the existing group.
Add our line for this key.
ELSE
Update the group with our line,
adding the original line (at position 3 + part no),
but not the metadata to the group.
IF the group is made complete, THEN
pass it on immediately,
delete this entry from the hash.
ELSE
Add the line as a group.
Make sure the content is at position 3 + part no, to allow easy updating.
在没有新行之后,我们将散列中的每个剩余值传递到下一个阶段。
要认识到的重要一点是,您不必在此处将所有行保留在内存中,而只需保留不完整的组。
有趣的Perl函数是exists $hash{element}
和delete $hash{element}
。 delete
对于节省内存可能很重要。
我们只是按时间戳对每个元素进行排序。如果系统要处理的总数据太多,我们可以使用一个技巧:
然而,这是耗时的。
在这里,我们只接收已排序和分组的项目。我们所要做的就是以正确的顺序输出所包含的行。
答案 1 :(得分:0)
我会分两个阶段完成:组合消息部分和排序。这应该会在一定程度上简化问题。
首先,我将使用外部排序实用程序(例如,GNU排序工具)按消息编号进行排序。这将至少将具有相同消息编号的所有部分组合在一起。一个简单的sort <inputfile >outputfile
将满足您的需求。你真正感兴趣的是让所有部分开始,例如,371,"...
彼此相邻。
然后,您可以编写Perl程序来读取输出并累积具有相同消息编号的行。当您看到不同的消息编号时,过滤您累积的行以组合来自不同部分的消息。并将该记录写入文件。您可能希望以更容易排序的形式编写输出。也许通过输出您在记录前面排序的字段,必要时填零,以简化排序。
完成后,你有一个文件,每行包含一个记录,如果你正确构建了记录,你可以再做一个sort <inputfile >outputfile
来按照你想要的顺序获取数据。
这也简化了您的编程:您不必担心为数据编写自定义排序。相反,您编写了一个相对简单的Perl程序来转换数据,以便更容易地按现有工具进行排序。