重击-每秒获取一个单词组的计数

时间:2019-02-11 10:12:51

标签: bash shell group-by count centos7

这是文本文件的示例。我需要从pipleline(“ |”)之前的字符串开始按秒数来计算单词“ Id”组

2019-02-10 12:00:03.448|Id: 26102338
2019-02-10 12:00:03.448|Id: 25941418
2019-02-10 12:00:03.449|Id: 25827373
2019-02-10 12:00:03.449|Id: 26102038
2019-02-10 12:00:03.449|Id: 25929358

2019-02-10 12:00:04.382 | =====================================Start 
fetching=====================================
2019-02-10 12:00:04.451 |
2019-02-10 12:00:04.426|Id: 25713118
2019-02-10 12:00:04.426|Id: 26076208
2019-02-10 12:00:04.426|Id: 26079643
2019-02-10 12:00:04.426|Id: 26085973
2019-02-10 12:00:04.426|Id: 26090023
2019-02-10 12:00:04.426|Id: 26130133
2019-02-10 12:00:04.426|Id: 25954018
2019-02-10 12:00:04.427|Id: 25951468
2019-02-10 12:00:04.427|Id: 26136148
2019-02-10 12:00:04.427|Id: 26103013
2019-02-10 12:00:04.427|Id: 25806433

我需要这样输出:

Time               |Count(Id)  
2019-02-10 12:00:03|5    
2019-02-10 12:00:04|11

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

如果每行最后总是有一个kubeadm join 10.109.x.xx:6443 --token 3j9fzw.h7jxrseyrvm04s7v --discovery-token-ca-cert-hash sha256:5b20e87a257ea5551d8f5b3e1d502de099b4811d6b0e6062ad571fa97f5acb [preflight] Running pre-flight checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.1. Latest validated version: 18.06 [discovery] Trying to connect to API Server "10.109.x.xx:6443" [discovery] Created cluster-info discovery client, requesting info from "https://10.109.x.xx:6443" [discovery] Requesting info from "https://10.109.x.xx:6443" again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.109.0.80:6443" [discovery] Successfully established connection with API Server "10.109.0.80:6443" [join] Reading configuration from the cluster... [join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' unable to fetch the kubeadm-config ConfigMap: unexpected error when reading kubeadm-config ConfigMap: ClusterConfiguration key value pair missing ,而您不介意格式相反,这很简单:

Id
  1. grep 'Id:' /tmp/data.txt | cut -f 1 -d '.' | uniq -c 5 2019-02-10 12:00:03 11 2019-02-10 12:00:04 丢掉空白行。

  2. grep选择点之前的字段(即不包含ms的时间)。

  3. cut对每次出现的总数进行计数。

(如果文件并非总是按顺序排列,则在uniq之前可能还需要一个sort)。

要反转数据并添加符合要求格式的管道,可以通过sed管道输出-类似于:

uniq

答案 1 :(得分:-1)

data.txt

2019-02-10 12:00:03.448|Id: 26102338
2019-02-10 12:00:03.448|Id: 25941418
2019-02-10 12:00:03.449|Id: 25827373
2019-02-10 12:00:03.449|Id: 26102038
2019-02-10 12:00:03.449|Id: 25929358

2019-02-10 12:00:04.426|Id: 25713118
2019-02-10 12:00:04.426|Id: 26076208
2019-02-10 12:00:04.426|Id: 26079643
2019-02-10 12:00:04.426|Id: 26085973
2019-02-10 12:00:04.426|Id: 26090023
2019-02-10 12:00:04.426|Id: 26130133
2019-02-10 12:00:04.426|Id: 25954018
2019-02-10 12:00:04.427|Id: 25951468
2019-02-10 12:00:04.427|Id: 26136148
2019-02-10 12:00:04.427|Id: 26103013
2019-02-10 12:00:04.427|Id: 25806433

2019-02-10 12:00:03.427|Id: 25806433

命令:

grep 'Id:' data.txt | cut -f 1 -d '.' | sort | uniq -c | awk '{print $2" "$3" | "$1}'
  

在计数之前先进行排序以避免时间戳混乱

输出:

2019-02-10 12:00:03 | 6
2019-02-10 12:00:04 | 11