从bash中的文件中提取行

时间:2014-04-28 20:47:20

标签: bash extract

我有一个像这样的文件

我想将0和1(文件中的所有行)的行提取到一个单独的文件中。但是,序列不必以0开头,但也可以从1开始。但是,该行总是直接在行之后(SITE :)。此外,我想将SITTE行本身提取为一个单独的文件。有人能告诉我在bash中这是可行的吗?

3 个答案:

答案 0 :(得分:1)

您可以尝试以下方式:

$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE:   0    0.000340988542    0.0357651018
SITE:   1    0.000529755514   0.00324293642
SITE:   2    0.000577745511     0.052214098

另一种解决方案,使用bash:

$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY";  done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000

要删除行尾的字符0

$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000

更新:评论中提供的新文件格式:

$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE:   967         0.189021866    0.0169990123
SITE:   968         0.189149593     0.246619149
SITE:   969         0.189172266  6.84752689e-05

答案 1 :(得分:1)

  

此外,我想将SITTE行本身提取到一个单独的文件中。

这很简单:

grep '^SITE:' infile > outfile.site

在之后提取稍微更难的行

grep --after-context=1 '^SITE:' infile \
    | grep '^[01]*$' \
    > outfile.nr

--after-context(或-A)指定在匹配行之后的行数。然后,我们使用第二个grep仅打印该行,而不是实际匹配的行(也不是grep在指定after-context时在每个匹配条目之间放置的分隔符。)

或者,您可以使用以下内容来匹配数字行:

grep '^[01]*$' infile > outfile.nr

这样更容易,但它会找到仅由0和1组成的所有行,无论它们是否位于以SITE:开头的行之后。

答案 2 :(得分:1)

这是一个简单的awk解决方案,它匹配以SITE:开头的所有行,并输出相应的下一行行:

awk '/^SITE:/ { if (getline) print }'  infile > outfile

只需省略{ ... }阻止部分即可将以SITE: 自身开头的所有行提取到单独的文件中:

awk '/^SITE:/' infile > outfile

如果您想合并这两项操作:

outfile1outfile2是2个输出文件的名称,作为变量awkf1传递给f2

awk -v f1=outfile1 -v f2=outfile2 \
  '/^SITE:/ { print > f1; if (getline) print > f2 }'  infile