如何在bash中单独剪切文本?

时间:2014-06-13 16:32:49

标签: linux bash

我有一个这样的文本文件:

10/22/2013  00:11:12 ioy_I2           dtgfd14_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:11:19 ioy_I2           dtgfd24_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:11:26 ioy_I2           dtgfd23_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:11:32 ioy_J2           dtgfd13_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:11:39 ioy_J2           dtgfd14_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:11:46 ioy_J2           dtgfd24_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:11:53 ioy_J2           dtgfd23_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:12:00 ioy_L2           dtgfd13_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:12:08 ioy_L2           dtgfd14_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:12:15 ioy_L2           dtgfd24_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:12:22 ioy_L2           dtgfd23_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:12:29 ioy_N2           dtgfd13_1           TC  (OFF)weqw########��kjhk6           10/22/2013  00:12:37 ioy_N2           dtgfd14_1           TC  

我必须清除这个文件,原始文件是二进制文件,我想将它转换为日志文件,同样如下:

10/22/2013  00:11:12 ioy_I2           dtgfd14_1           TC  (OFF)weqw  kjhk6           
10/22/2013  00:11:19 ioy_I2           dtgfd24_1           TC  (OFF)weqw  kjhk6           
10/22/2013  00:11:26 ioy_I2           dtgfd23_1           TC  (OFF)weqw  kjhk6           
10/22/2013  00:11:32 ioy_J2           dtgfd13_1           TC  (OFF)weqw  kjhk6           
10/22/2013  00:11:39 ioy_J2           dtgfd14_1           TC  (OFF)weqw  kjhk6           
10/22/2013  00:11:46 ioy_J2           dtgfd24_1           TC  (OFF)weqw  

2 个答案:

答案 0 :(得分:0)

试试这个GNU sed命令,

sed -ri 's/(weqw)########..(k)/\1 \2/g; s~10/22/2013~\n10/22/2013~g' file

示例:

$ sed -r 's/(weqw)########..(k)/\1 \2/g' file | sed 's~10/22/2013~\n10/22/2013~g'

10/22/2013  00:11:12 ioy_I2           dtgfd14_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:11:19 ioy_I2           dtgfd24_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:11:26 ioy_I2           dtgfd23_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:11:32 ioy_J2           dtgfd13_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:11:39 ioy_J2           dtgfd14_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:11:46 ioy_J2           dtgfd24_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:11:53 ioy_J2           dtgfd23_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:12:00 ioy_L2           dtgfd13_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:12:08 ioy_L2           dtgfd14_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:12:15 ioy_L2           dtgfd24_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:12:22 ioy_L2           dtgfd23_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:12:29 ioy_N2           dtgfd13_1           TC  (OFF)weqw kjhk6           
10/22/2013  00:12:37 ioy_N2           dtgfd14_1           TC

答案 1 :(得分:0)

这看起来像是sed的工作。只需将其设置为在特定模式后插入换行符:

sed 's|\([0-1][0-9]/[1-31]/2013\)|\n\1|g' myfile > mynewfile

工作原理:

sed                    the stream editor. Learn more with "man sed"
s                      the sed command we're running in "substitute"
\( \)                  designates a capture group so we can reference it with \1 later
[0-1][0-9]/[1-31]/2013 this regex will match most dates. Modify it to suit your needs.
\n\1                   replace the previous pattern with itself prefixed by a newline
g                      continue after the first match

请注意,我使用了管道(|)而不是斜杠(/)作为分隔符,这与sed一样。这样我就不必在日期正则表达式模式中删除所有斜杠。

正则表达式有很多资源,但我碰巧喜欢this one