将单行数据转换为多行

时间:2017-10-12 16:17:53

标签: bash awk sed

在同一行中考虑这个长输入

ITEM1 12-Oct-2017 DAVID BRYCE 12-Oct-2017 Sold 400,000 0.410 1.37 0.97 2.34 ITEM2 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Shipped 4,350,000 0.045 11.31 4.88 16.19 ITEM2 12-Oct-2017 DAVID BRYCE 09-Oct-2017 Shipped 2,900,000 0.045 11.31 4.88 16.19 ITEM1 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Sold 2,200,000 0.045 11.31 4.88 16.19

如何在bash中执行此操作以便我可以格式化为CSV格式,以便我可以在电子表格中进一步处理?

示例所需的输出:

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34 
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19

4 个答案:

答案 0 :(得分:2)

扩展GNU sed 方法(针对您当前的输入):

sed -E 's/ +(ITEM[0-9]+)/\n\1/g; s/ ([0-9])/|\1/g; s/([0-9]) /\1|/g;' file

输出:

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19

<强> ----------

奖金附加条件的解决方案:&#34;如果第一个字段是一个单词的仲裁怎么办?例如, FILE,STAPLER,PEN,NOTEBOOK ?&#34;

示例file内容:

FILE 12-Oct-2017 DAVID BRYCE 12-Oct-2017 Sold 400,000 0.410 1.37 0.97 2.34 STAPLER 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Shipped 4,350,000 0.045 11.31 4.88 16.19 PEN 12-Oct-2017 DAVID BRYCE 09-Oct-2017 Shipped 2,900,000 0.045 11.31 4.88 16.19 NOTEBOOK 12-Oct-2017 MICHAEL LEE BRIDGES 09-Oct-2017 Sold 2,200,000 0.045 11.31 4.88 16.19
sed -E 's/([0-9]+\.[0-9]+) +([A-Z]+)/\1\n\2/g; s/ ([0-9])/|\1/g; s/([0-9]) /\1|/g;' file

输出:

FILE|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34
STAPLER|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19
PEN|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19
NOTEBOOK|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19

答案 1 :(得分:0)

这应该可以胜任。

sed 's/ITEM/\nITEM/g' input.txt | sed '/^$/d' | awk '{ print $1"|"$2"|"$3" "$4"|"$5"|"$6"|"$7"|"$8"|"$9"|"$10}'

问候!

答案 2 :(得分:0)

awk 单行。

如果您有GNU-Awk,那么您可以使用它,因为它支持多行RS

$ awk -v RS="ITEM" 'FNR>1{a=""; printf RS$1"|"$2"|"; for(i=3; i<=NF-10+2; i++){a=a$i" "}; printf a$i; while(i++<NF) printf "|"$i; printf "\n"}' file

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19

我们在这里使用ITEM作为记录分隔符。

解决方案-2

$ awk -v RS="ITEM" 'FNR>1{printf RS$1"|"$2"|"$3; for(i=4; i<=NF; i++) {k=(NF>10 && i<=NF-7) ? " "  : "|"; printf k$i} printf "\n"}' file 

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19

答案 3 :(得分:0)

sed/awk

$ sed 's/ ITEM/\nITEM/g' file | 
  awk -v OFS="|" 'NF>10{for(i=4;i<=3+NF-10;i++) {$3=$3 FS $i; $i=$(i+(NF-10))}}1'

ITEM1|12-Oct-2017|DAVID BRYCE|12-Oct-2017|12-Oct-2017|Sold|400,000|0.410|1.37|0.97|2.34
ITEM2|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Shipped|09-Oct-2017|Shipped|4,350,000|0.045|11.31|4.88|16.19
ITEM2|12-Oct-2017|DAVID BRYCE|09-Oct-2017|09-Oct-2017|Shipped|2,900,000|0.045|11.31|4.88|16.19
ITEM1|12-Oct-2017|MICHAEL LEE BRIDGES|09-Oct-2017|Sold|09-Oct-2017|Sold|2,200,000|0.045|11.31|4.88|16.19