Question

我有一个文本数据文件，如下所示：

Day-Hour, 08188, 0, 08188, 1, (indicating the time is year 2008, julian day 188, between hour0 and hour1)
Receptor, A, (actual data begins)
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
1, 2, 3, 4,
5, 6, 7, 8,
... (continue data for a total of 22 receptors, each receptor has 8 data values)

Day-Hour, 08188, 1, 08188, 2,
Receptor, A,
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
1, 2, 3, 4,
5, 6, 7, 8,
... (continue data for a total of 22 receptors, each receptor has 8 data values, but this is for hours 1 to 2)

...... (continue the same previous pattern for a total of 24 times)

我想将其重新格式化为：

day, time, receptor, data1, data2, data3, ....data8  (header)
08188, 0, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 0, B, 1, 2, 3, 4, 5, 6, 7, 8
... (repeat the same hour for all 22 receptor sites)
08188, 1, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 1, B, 1, 2, 3, 4, 5, 6, 7, 8 
...(repeat the same hour for all 22 receptor sites)
...
...(repeat the same order 24 times)

我已经设法通过几个步骤实现了我想要的格式，使用awk和sed的组合，如下所示，但我觉得通过这么多步骤是愚蠢的，并且我希望专家＆＃39;帮助以更简单的步骤来解决这个问题。非常感谢您的投入！

(example steps:)
step1:  $ grep -v "Day-Hour" infile.txt > temp1.txt  # remove all Day-Hour lines, 
                                                     # as I know the order of the data
step2:  $ sed '/^$/d' temp1.txt > temp2.txt  # remove empty lines
step3:  $ awk 'ORS=NR%3" ":"\n"' temp2.txt > temp3.txt  #concatenate every 3 lines
step4:  $ (create a file, e.g. daytime.txt, with 2 fields (day and hour) with following content)
         08188, 0,
         (repeat 22 times)
         08188, 1,
         (repeat 22 times)
         .... (continue through hour 23)
step5:  $ paste daytime.txt temp3.txt > final.txt

Answer 1

这将加入他们：

sed 's/$/,/;N;N;N;N;N;N;N; s/\n/ /g' foo.txt

进入这个：

Day-Hour，08188,0,08188,1，Receptor，A，1,2,3,4,5,6,7,8，受体，B，1,2,3,4,5,6,7,8，Day-Hour，08188,1,08188,2，受体，A，1,2,3,4,5,6,7,8，受体，B，1,2,3,4,5,6,7， 8，

然后我在重新包装中变得懒惰：

... | awk '{ $1 = ""; $4 = ""; $5 = ""; print }' | sed -e 's/ \(.*\)  Receptor, \(A,.*\)Receptor, \(B, .*\)/\1\2\n\1\3/'

在我的系统上产生了所需的输出。

Answer 2

这可以完成这项工作：

cat file
Day-Hour, 08188, 0, 08188, 1
Receptor, A,
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
11, 12, 13, 14,
15, 16, 17, 18,
Receptor, C,
21, 22, 23, 24,
25, 26, 27, 28,

Day-Hour, 08188, 1, 08188, 2
Receptor, A,
1, 2, 3, 4,
5, 6, 7, 8,
Receptor, B,
1, 2, 3, 4,
5, 6, 7, 8,

awk -v RS= -v OFS=", " -F", *|\n" 'BEGIN {print "day, time, receptor, data1, data2, data3,....data8"} {for (i=7;i<=NF;i+=13) print $2,$3,$i,$(i+2),$(i+3),$(i+4),$(i+5),$(i+7),$(i+8),$(i+9),$(i+10)}' file
day, time, receptor, data1, data2, data3,....data8
08188, 0, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 0, B, 11, 12, 13, 14, 15, 16, 17, 18
08188, 0, C, 21, 22, 23, 24, 25, 26, 27, 28
08188, 1, A, 1, 2, 3, 4, 5, 6, 7, 8
08188, 1, B, 1, 2, 3, 4, 5, 6, 7, 8

这将打印所有Receptor，如果是1或22。

使用awk或sed重新格式化文本文件

2 个答案: