使用shell拆分数据

时间:2016-07-27 20:53:56

标签: shell unix hadoop

我是新的shell脚本。我需要使用shell脚本在运行和自动匹配计数之间获取数据。这样它就可以作为半结构化数据处理。请建议

2 个答案:

答案 0 :(得分:1)

sed -n '/run/,/Automatic/ {//!p }' test.txt

这将打印run和Automatic之间的所有行(,)。//!从输出中删除行运行和自动匹配计数。

enter image description here

答案 1 :(得分:1)

使用sed -n '/run/,/Automatic/p' filename.txt|sed '1d;$d'|sed '$d;s/ //g' - 应该清理数据(第一行,最后两行和开头的空格)

shell脚本 - split.sh

#!/bin/bash
sed -n '/run/,/Automatic/p' $1|sed '1d;$d'|sed '$d;s/        //g'

运行以下任何文件,以便在控制台和文件中输出:

shell> ./split.sh test.txt |tee splitted.dat
United Kingdom:       21/09/2012
Started:      08/02/2013 16:04:44
Finished:     08/02/2013 16:21:23
Time to process:      0 days 0 hours 16 mins 39 secs
Records processed:    37497
Throughput:   135124 records/hour
Time per record:      0.0266 secs

输出将存储在splitted.dat文件中:

shell> cat splitted.dat 
United Kingdom:       21/09/2012
Started:      08/02/2013 16:04:44
Finished:     08/02/2013 16:21:23
Time to process:      0 days 0 hours 16 mins 39 secs
Records processed:    37497
Throughput:   135124 records/hour
Time per record:      0.0266 secs
shell> 

<强> 更新

#!/bin/bash
# p                     - print lines with specified conditions 
# !p                    - print lines except specified in conditions (opposite of p)
# |(pipe)               - passes output of first command to the next
# $d                    - delete last line
# 1d                    - delete first line ( nd - delete nth line)
# '/run/,/Automatic/!p' - print lines except lines between 'run' to 'Automatic'
# sed '1d;s/        //g'- use output from first sed command and delete the 1st line and replace spaces with nothing

sed -n '/run/,/Automatic/!p' $1 |sed '1d;s/        //g'

输出:

Verified Correct:     32426 (86.5%)
Good Match:    2102 ( 5.6%)
Good Premise Partial:   862 ( 2.3%)
Tentative Match:       1039 ( 2.8%)
Poor Match:       4 ( 0.0%)
Multiple Matches: 7 ( 0.0%)
Partial Match:  872 ( 2.3%)
Foreign Address:  2 ( 0.0%)
Unmatched:      183 ( 0.5%)