我有以下专栏
21:32:47 daemon DENIED: "Prog1" usera server82 (Licensed number of users already reached.)
21:32:48 daemon DENIED: "Prog1" usera server82 (Licensed number of users already reached.)
21:32:51 daemon DENIED: "Prog1" usera server39 (Licensed number of users already reached.)
21:58:38 daemon DENIED: "Prog2" userb server97 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog2" userb server97 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog3" userb server97 (User/host not on INCLUDE list for feature.)
21:58:40 daemon DENIED: "Prog2" userd server04 (User/host not on INCLUDE list for feature.)
22:35:59 daemon DENIED: "Prog2" userd server92 (User/host not on INCLUDE list for feature.)
我想做的是过滤它并仅显示最近时间的非重复行!
所以结果应该是这样的:
21:32:48 daemon DENIED: "Prog1" usera server82 (Licensed number of users already reached.)
21:32:51 daemon DENIED: "Prog1" usera server39 (Licensed number of users already reached.)
21:58:38 daemon DENIED: "Prog2" userb server97 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog3" userb server97 (User/host not on INCLUDE list for feature.)
21:58:40 daemon DENIED: "Prog2" userd server04 (User/host not on INCLUDE list for feature.)
22:35:59 daemon DENIED: "Prog2" userd server92 (User/host not on INCLUDE list for feature.)
正如您可能注意到的,有些行具有相同的用户或程序,但由于服务器或时间不同,总共所有行都不相同。
答案 0 :(得分:2)
我利用数组键的唯一性。变量部分是小时,所以我将它存储在数组值中,当前行作为数组键(不带小时)。
$ awk '
{hour=$1;$1="";arr[$0]=hour}
END{for (a in arr) {print arr[a] a}}
' file.txt
输出:
21:32:48 daemon DENIED: "Prog1" usera server82 (Licensed number of users already reached.)
21:58:40 daemon DENIED: "Prog2" userd server04 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog3" userb server97 (User/host not on INCLUDE list for feature.)
22:35:59 daemon DENIED: "Prog2" userd server92 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog2" userb server97 (User/host not on INCLUDE list for feature.)
21:32:51 daemon DENIED: "Prog1" usera server39 (Licensed number of users already reached.)
答案 1 :(得分:1)
sort -r file.txt | uniq -f 1 | tac
sort -r
:按时间戳以相反顺序对行进行排序。uniq -f 1
:忽略时间戳,删除重复的行,只留下第一次遇到的每一行。由于我们反向排序,这将是最新的。tac
:反转行的顺序,因此按时间戳将其重新按顺序排列。以下是样本数据的输出:
21:32:48 daemon DENIED: "Prog1" usera server82 (Licensed number of users already reached.)
21:32:51 daemon DENIED: "Prog1" usera server39 (Licensed number of users already reached.)
21:58:38 daemon DENIED: "Prog2" userb server97 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog3" userb server97 (User/host not on INCLUDE list for feature.)
21:58:40 daemon DENIED: "Prog2" userd server04 (User/host not on INCLUDE list for feature.)
22:35:59 daemon DENIED: "Prog2" userd server92 (User/host not on INCLUDE list for feature.)
你在Linux上标记了这个问题,所以我使用了GNU tac
实用程序;如果您使用的是Mac或BSD系统,则可以使用tail -r
代替。
答案 2 :(得分:0)
记录日志文件并消除所有重复排序但仍按时间顺序排除
以下是您可以使用的脚本,我将讨论每个命令正在做什么
<强> SCRIPT 强>
#!/bin/bash
rm -f "$2" 2> /dev/null
touch "$2"
cat "$1" > tmp
sort -r tmp > "$1"
rm -f tmp 2> /dev/null
while read -r line; do
line_to_find=`echo "$line"|cut -d ' ' -f2- `
no_of_duplicated_lines=`grep "$line_to_find" "$1"|wc -l`
if [[ "$no_of_duplicated_lines" != @(1) ]]; then
matching_line_in_log_files=`grep "$line_to_find" "$2"`
if [ -z "$matching_line_in_log_files" ]; then
echo "$line" >> "$2"
fi
else
echo "$line" >> "$2"
fi
done < "$1"
cat "$2" > tmp
sort -r tmp > "$2"
rm -f tmp 2> /dev/null
脚本如何工作
my_script < log_file > < new_log_file >
“脚本路径”“要修改的日志文件的路径”“新的日志文件位置”
[步骤1]删除任何现有文件,我有新的文件名,我想创建
rm -f "$2" 2> /dev/null
[步骤2]根据最近的时间创建一个没有重复的新日志文件并分类旧的日志文件消息
打开旧日志文件并将其重定向到临时文件,然后根据时间对所有邮件进行排序。然后,将所有已排序的消息重定向回旧日志文件
cat "$1" > tmp
sort -r tmp > "$1"
rm -f tmp 2> /dev/null
[第3步]阅读您的旧日志文件中的每一行
这是通过使用while do循环读取行直到文件结尾
来完成的while read -r line; do
............
............
done < "$1"
[步骤4]通过消除TIMESTAMP修改每条线路并保存为可变
我们已根据最近的时间戳
对邮件进行排序line_to_find=`echo "$line"|cut -d ' ' -f2- `
[步骤5]搜索日志文件。由于我们已根据TIMESTAMP获得排序信息,因此在没有TIMESTAMP的情况下变化的旧日志中没有复制线且没有复制线
我首先在旧日志中搜索重复的行。如果找到重复项,请检查新的日志文件以查看该行是否已重定向到该行。
no_of_duplicated_lines=`grep "$line_to_find" "$1"|wc -l`
if [[ "$no_of_duplicated_lines" != @(1) ]]; then
matching_line_in_log_files=`grep "$line_to_find" "$2"`
if [ -z "$matching_line_in_log_files" ]; then
echo "$line" >> "$2"
fi
else
[步骤6]如果否。 OF LINES = 1,将线路重新划分为新的日志文件
如果在旧日志中只找到一行,并且它不是重复行,只需将该行从旧日志重定向到新日志。
echo "$line" >> "$2"
fi
[STEP 7]按时间戳排序新的日志文件 - 最近到最晚 打开新日志并根据最近的时间排序并重定向到tmp文件。然后,将所有已排序的输出从tmp文件重定向到新日志。处理完成后删除tmp文件。
cat "$2" > tmp
sort -r tmp > "$2"
rm -f tmp 2> /dev/null
根据时间顺序输出
22:35:59 daemon DENIED: "Prog2" userd server92 (User/host not on INCLUDE list for feature.)
21:58:40 daemon DENIED: "Prog2" userd server04 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog3" userb server97 (User/host not on INCLUDE list for feature.)
21:58:38 daemon DENIED: "Prog2" userb server97 (User/host not on INCLUDE list for feature.)
21:32:51 daemon DENIED: "Prog1" usera server39 (Licensed number of users already reached.)
21:32:48 daemon DENIED: "Prog1" usera server82 (Licensed number of users already reached.)