我有一个制表符分隔文件,其中包含日期,标题行,某些值,空行,然后它会重复多次重复。该文件看起来像这样:
November 3, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
November 4, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
我正在尝试找到正确的sed或awk命令来转换数据,以便可以用它来创建图表。我希望转换后的数据看起来像这样:
date column_name1 column_name2 column_name3 column_name4
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
date column_name1 column_name2 column_name3 column_name4
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
答案 0 :(得分:3)
使用'Sed'
'infile'的内容:
$ cat infile
November 3, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
November 4, 2011
column_name1 column_name2 column_name3 column_name4
value value value value
value value value value
value value value value
value value value value
sed脚本的内容:
$ cat script.sed
## When line has a date.
/[0-9]\+,[ ]*[0-9]\{4\}/ {
## Save date to HS (hold space).
h
## Read next line (header).
N
## Insert 'date' string at the beginning of the line.
s/.*\n/date\t/
## Print and read next line.
P
n
}
## Process next line if blank line found.
/^[ \t]*$/ {
p
d
}
## Process data inserting the date in the beginning.
## Put at the end of PS (pattern space) the date saved before and exchange it
## with the rest of the line. Print after that.
G
s/^\(.*\)\n\(.*\)$/\2\t\1/
p
执行脚本:
$ sed -n -f script.sed infile
date column_name1 column_name2 column_name3 column_name4
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
November 3, 2011 value value value value
date column_name1 column_name2 column_name3 column_name4
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
November 4, 2011 value value value value
答案 1 :(得分:2)
awk中。
BEGIN {
FS = "\n"
RS = "\n\n"
OFS = "\t"
#ORS = "\n"
}
{
print "date" OFS $2
for (i = 3; i <= NF; i++)
print $1 OFS $i
print ""
}
答案 2 :(得分:2)
这个GNU sed解决方案可能有效:
sed -r '/^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/,/^$/{//{h;/^$/!{s/.*//;N;s/\n/date /;b}}};G;s/(.*)\n(.*)/\2 \1/;' input_file
编辑:我应该包含一个解释!
sed命令仅更改以日期/^[A-Z][a-z]+\s+[0-9][0-9]?,\s+([0-9]{4})/
和空行/^$/
开头的行之间的行。如果是这样,并且该行符合其中一个条件//
它在保留空间h
中,如果该行不是空的(即日期),则添加它,将其清除s/.*//
,附加下一行N
然后添加文字date
到它s/\n/data
。完成所有操作后,它会在{1}}中断,以便在下一行中读取。对于以下所有行(请记住这是在起始条件内),它会将保留空间b
(包含日期的行)附加到当前行,然后使用替换预先设置日期并丢失换行符G
(s/(.*)\n(.*)/\2 \1/
命令的副作用)。瞧!