将第一行段落附加到多行

时间:2011-11-04 23:21:18

标签: bash sed awk

我有一个制表符分隔文件,其中包含日期,标题行,某些值,空行,然后它会重复多次重复。该文件看起来像这样:

November 3, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

November 4, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

我正在尝试找到正确的sed或awk命令来转换数据,以便可以用它来创建图表。我希望转换后的数据看起来像这样:

date    column_name1    column_name2    column_name3    column_name4
November 3, 2011    value   value   value   value
November 3, 2011    value   value   value   value
November 3, 2011    value   value   value   value
November 3, 2011    value   value   value   value

date    column_name1    column_name2    column_name3    column_name4
November 4, 2011    value   value   value   value
November 4, 2011    value   value   value   value
November 4, 2011    value   value   value   value
November 4, 2011    value   value   value   value

3 个答案:

答案 0 :(得分:3)

使用'Sed'

'infile'的内容:

$ cat infile
November 3, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

November 4, 2011
column_name1    column_name2    column_name3    column_name4
value   value   value   value
value   value   value   value
value   value   value   value
value   value   value   value

sed脚本的内容:

$ cat script.sed
## When line has a date.
/[0-9]\+,[ ]*[0-9]\{4\}/ {
        ## Save date to HS (hold space).
        h
        ## Read next line (header).
        N
        ## Insert 'date' string at the beginning of the line.
        s/.*\n/date\t/
        ## Print and read next line.
        P
        n
}

## Process next line if blank line found.
/^[ \t]*$/ {
        p
        d
}

## Process data inserting the date in the beginning.
## Put at the end of PS (pattern space) the date saved before and exchange it 
## with the rest of the line. Print after that.
G
s/^\(.*\)\n\(.*\)$/\2\t\1/
p

执行脚本:

$ sed -n -f script.sed infile
date    column_name1    column_name2    column_name3    column_name4
November 3, 2011        value   value   value   value
November 3, 2011        value   value   value   value
November 3, 2011        value   value   value   value
November 3, 2011        value   value   value   value

date    column_name1    column_name2    column_name3    column_name4
November 4, 2011        value   value   value   value
November 4, 2011        value   value   value   value
November 4, 2011        value   value   value   value
November 4, 2011        value   value   value   value

答案 1 :(得分:2)

awk中。

BEGIN {
    FS = "\n"
    RS = "\n\n"
    OFS = "\t"
    #ORS = "\n"
}
{
    print "date" OFS $2
    for (i = 3; i <= NF; i++)
        print $1 OFS $i
    print ""
}

答案 2 :(得分:2)

这个GNU sed解决方案可能有效:

 sed -r '/^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/,/^$/{//{h;/^$/!{s/.*//;N;s/\n/date /;b}}};G;s/(.*)\n(.*)/\2 \1/;' input_file
编辑:我应该包含一个解释!

sed命令仅更改以日期/^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/和空行/^$/开头的行之间的行。如果是这样,并且该行符合其中一个条件//它在保留空间h中,如果该行不是空的(即日期),则添加它,将其清除s/.*//,附加下一行N然后添加文字date到它s/\n/data。完成所有操作后,它会在{1}}中断,以便在下一行中读取。对于以下所有行(请记住这是在起始条件内),它会将保留空间b(包含日期的行)附加到当前行,然后使用替换预先设置日期并丢失换行符Gs/(.*)\n(.*)/\2 \1/命令的副作用)。瞧!