假设我有一个如下所示的输入文件:
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:15 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
我可以删除所有重复的连续行,忽略uniq -f2 file.txt
的前两列,但我正在寻找一种方法只删除其中包含has connected.
的重复项,以便输出看起来像这样:
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
我想这可以通过匹配一个固定的字符串来实现("已连接。")但我也对一个可以使用正则表达式的命令感兴趣。
我查看了this question的答案,但无法修改命令,因此它们可以使用我的输入。
答案 0 :(得分:1)
$ awk -F'>' '!(/has connected/ && seen[$2]++)' file
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
答案 1 :(得分:1)
一行Perl解决方案
perl -nE 'print unless /has connected/ && @s{/>\s+(.+)/}++' myfile.log
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
请注意,故意使用 哈希切片 @s{/>\s+(.+)/}++
。它通常是一个错误,但在这里它用于将正则表达式放在列表上下文
如果您想要Chris Charley wrote之类的可爱内容,只有在用户之前已断开连接时才会报告已连接,那么在单行中无法理解。这个脚本会为你做这个
如果您不熟悉Perl,那么要在文件上运行此功能,您应该将<DATA>
更改为<>
并运行此类程序
$ perl filter.pl myfile.log
use strict;
use warnings;
my %online;
while ( <DATA> ) {
next unless my ($name, $op) = />\s+(.+)\s+(disconnected|has connected)\./;
if ( $op eq 'disconnected' ) {
delete $online{$name};
print;
}
else {
print unless $online{$name}++;
}
}
__DATA__
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:15 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:15 > user1 disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:15 > user1 disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
答案 2 :(得分:0)
用awk:
na.omit(DT[, names(DT) := .(Type[1L], shift(Cohort, type="lead")), cumsum(Type!="")])
# Type Cohort
# 1: A 1
# 2: A 2
# 3: A 3
# 4: A 4
# 5: B 5
# 6: B 6
# 7: B 7
# 8: C 8
# 9: C 9
#10: C 10
#11: C 11
#12: C 12
检查数组中是否已存在某个值,或者如果字符串中已“断开连接”,则检查该值是否
awk -F">" '!($2 in a) || $2 ~ /disconnected/ {a[$2]=$2; print}' < file.txt
输出
!($2 in a) || $2 ~ /disconnected/
答案 3 :(得分:0)
我认为这个perl解决方案可能就是你想要的。我在数据中添加了更多行。
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<DATA>) {
if (/ > (.+? connected)/) {
print unless $seen{$1}++;
}
else {
%seen = ();
print;
}
}
__DATA__
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:15 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:31 > user1 has connected.
2016-06-03 21:00:31 > user1 has connected.
2016-06-03 21:00:34 > user1 has connected.
2016-06-03 21:00:50 > user2 has connected.
2016-06-03 21:00:51 > user2 has connected.
打印
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:31 > user1 has connected.
2016-06-03 21:00:50 > user2 has connected.