Question

这是文本文件的示例。我需要从pipleline（“ |”）之前的字符串开始按秒数来计算单词“ Id”组

2019-02-10 12:00:03.448|Id: 26102338
2019-02-10 12:00:03.448|Id: 25941418
2019-02-10 12:00:03.449|Id: 25827373
2019-02-10 12:00:03.449|Id: 26102038
2019-02-10 12:00:03.449|Id: 25929358

2019-02-10 12:00:04.382 | =====================================Start 
fetching=====================================
2019-02-10 12:00:04.451 |
2019-02-10 12:00:04.426|Id: 25713118
2019-02-10 12:00:04.426|Id: 26076208
2019-02-10 12:00:04.426|Id: 26079643
2019-02-10 12:00:04.426|Id: 26085973
2019-02-10 12:00:04.426|Id: 26090023
2019-02-10 12:00:04.426|Id: 26130133
2019-02-10 12:00:04.426|Id: 25954018
2019-02-10 12:00:04.427|Id: 25951468
2019-02-10 12:00:04.427|Id: 26136148
2019-02-10 12:00:04.427|Id: 26103013
2019-02-10 12:00:04.427|Id: 25806433

我需要这样输出：

Time               |Count(Id)  
2019-02-10 12:00:03|5    
2019-02-10 12:00:04|11

有人可以帮忙吗？

Answer 1

如果每行最后总是有一个kubeadm join 10.109.x.xx:6443 --token 3j9fzw.h7jxrseyrvm04s7v --discovery-token-ca-cert-hash sha256:5b20e87a257ea5551d8f5b3e1d502de099b4811d6b0e6062ad571fa97f5acb [preflight] Running pre-flight checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.1. Latest validated version: 18.06 [discovery] Trying to connect to API Server "10.109.x.xx:6443" [discovery] Created cluster-info discovery client, requesting info from "https://10.109.x.xx:6443" [discovery] Requesting info from "https://10.109.x.xx:6443" again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.109.0.80:6443" [discovery] Successfully established connection with API Server "10.109.0.80:6443" [join] Reading configuration from the cluster... [join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' unable to fetch the kubeadm-config ConfigMap: unexpected error when reading kubeadm-config ConfigMap: ClusterConfiguration key value pair missing，而您不介意格式相反，这很简单：

Id

grep 'Id:' /tmp/data.txt | cut -f 1 -d '.' | uniq -c 5 2019-02-10 12:00:03 11 2019-02-10 12:00:04丢掉空白行。
grep选择点之前的字段（即不包含ms的时间）。
cut对每次出现的总数进行计数。

（如果文件并非总是按顺序排列，则在uniq之前可能还需要一个sort）。

要反转数据并添加符合要求格式的管道，可以通过sed管道输出-类似于：

uniq

Answer 2

data.txt

2019-02-10 12:00:03.448|Id: 26102338
2019-02-10 12:00:03.448|Id: 25941418
2019-02-10 12:00:03.449|Id: 25827373
2019-02-10 12:00:03.449|Id: 26102038
2019-02-10 12:00:03.449|Id: 25929358

2019-02-10 12:00:04.426|Id: 25713118
2019-02-10 12:00:04.426|Id: 26076208
2019-02-10 12:00:04.426|Id: 26079643
2019-02-10 12:00:04.426|Id: 26085973
2019-02-10 12:00:04.426|Id: 26090023
2019-02-10 12:00:04.426|Id: 26130133
2019-02-10 12:00:04.426|Id: 25954018
2019-02-10 12:00:04.427|Id: 25951468
2019-02-10 12:00:04.427|Id: 26136148
2019-02-10 12:00:04.427|Id: 26103013
2019-02-10 12:00:04.427|Id: 25806433

2019-02-10 12:00:03.427|Id: 25806433

命令：

grep 'Id:' data.txt | cut -f 1 -d '.' | sort | uniq -c | awk '{print $2" "$3" | "$1}'

在计数之前先进行排序以避免时间戳混乱

输出：

2019-02-10 12:00:03 | 6
2019-02-10 12:00:04 | 11

重击-每秒获取一个单词组的计数

2 个答案: