根据不同的术语将单个列拆分为多个

时间:2016-03-03 15:57:45

标签: bash

我搜索过已经被问过的问题,但找不到与我试图解决的内容相匹配的问题。

我在Mac上,使用终端。我希望这可以作为另一个用bash编写的脚本的一部分运行。

我有一个包含单个列的CSV文件。在每个"标题下#34;将根据输出包含不同数量的设备。标题(SerialNumber,DeviceName,PurchaseDate)将始终保持不变。

SerialNumbers
A1B2C3D4E5F6
SASIUWOI9828
I3I6K36H78SK
设备名称
这个有一个简短的名字
这个名字长 这个具有中等名称
而purchaseDate
2016年2月19日
2016年2月1日
2016年2月12日

期望的输出

SerialNumbers,设备名称,而purchaseDate
A1B2C3D4E5F6,这个有一个简称,2016-02-19
SASIUWOI9828,这个有一个很长的名字,2016-02-01
I3I6K36H78SK,这个有中等名称,2016-02-12

这是我的源文件,如果有帮助

https://www.dropbox.com/s/wapjqbi1v3oah3p/tobecorrected.csv?dl=0

4 个答案:

答案 0 :(得分:1)

我不确定您的操作系统中是否存在pr,但这是最简单的方法

$ pr -3ts, file

SerialNumbers,DeviceName,PurchaseDate
A1B2C3D4E5F6,This one has a short name,2016-02-19
SASIUWOI9828,This one has a long name,2016-02-01
I3I6K36H78SK,This one has a medium name,2016-02-12

答案 1 :(得分:0)

假设标题始终以相同的顺序显示,您可以使用以下脚本convert.sh

#!/bin/bash
C1="`awk '/SerialNumbers/{flag=1}/DeviceName/{flag=0}flag' $1`"
C2="`awk '/DeviceName/{flag=1}/PurchaseDate/{flag=0}flag' $1`"
C3="`awk '/PurchaseDate/,0' $1`"
paste <(echo "$C1") <(echo "$C2") <(echo "$C3") --delimiters ','

示例:

./convert.sh test.txt

输出:

SerialNumbers,DeviceName,PurchaseDate
A1B2C3D4E5F6,This one has a short name,2016-02-19
SASIUWOI9828,This one has a long name,2016-02-01
I3I6K36H78SK,This one has a medium name,2016-02-12

答案 2 :(得分:0)

这个awk将以任何顺序处理标题,并在标题后面加上可变长度数据:

awk  '
/SerialNumbers/ {sn=1; dn=0; pd=0}
/DeviceName/ {sn=0; dn=1; pd=0}
/PurchaseDate/ {sn=0; dn=0; pd=1}

sn==1 {snl[++snc]=$0}
dn==1 {dnl[++dnc]=$0}
pd==1 {pdl[++pdc]=$0}

END{
    max=snc>dnc?snc:dnc;
    max=pdc>max?pdc:max;
    for (i=1;i<=max;i++)
        print snl[i]","dnl[i]","pdl[i]
}' file

修改

鉴于您可以执行example file

awk '/^[[:alnum:]]+:/ {sub(/:/,""); idx=$0; arr[idx]=$0; next}
{arr[idx]=arr[idx]","$1}
END{
    for (id in arr) print arr[id]}' file.txt | rs -c',' -C',' -T | sed 's/,$//'

打印:

serialNumber,bluetoothAddress,wifiAddress,enclosureColor,totalDiskCapacity
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQF,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQG,0.214583,0.214583,#b4b5b9,1585
DMPQG,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQG,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585

如果您的字段包含空格,请将{arr[idx]=arr[idx]","$1}替换为:

{  
    sub(/^[[:space:]]+/,"")
    sub(/[[:space:]]+$/,"")
    arr[idx]=arr[idx]","$0
}

然后打印:

serialNumber,bluetoothAddress,wifiAddress,enclosureColor,totalDiskCapacity
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQF,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQG,0.214583,0.214583,#b4b5b9,1585
DMPQG,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583 B59,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQG,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585
DMPQD,0.214583,0.214583,#b4b5b9,1585

(注意添加B59的较长行)

答案 3 :(得分:0)

只是为了变化,这是一个不使用awk的解决方案。请注意,您需要在输入文件中使用尾随换行符才能正确输出,我假设标题及其顺序事先已知(否则第一个if语句将需要更改)。

#!/bin/bash

filename="$1"

declare -a arr=("SerialNumbers" "DeviceName" "PurchaseDate")
declare -A output

col=0
while read -r line
do
    if [[ "${arr[$col]}" == "$line" ]]; then # header
        col=$((col+1))
        row=1
        output[$((row-1)),$((col-1))]=$line
    else
        output[$row,$((col-1))]=$line
        row=$((row+1))
    fi
done < "$filename"

# print results
for ((i=0;i<row;i++)) do
    for ((j=0;j<col;j++)) do
        printf "${output[$i,$j]}"
        if (( j < col-1)); then
            printf ","
        fi
    done
    echo
done

输出:

$ ./script.sh example.txt
SerialNumbers,DeviceName,PurchaseDate
A1B2C3D4E5F6,This one has a short name,2016-02-19
SASIUWOI9828,This one has a long name,2016-02-01
I3I6K36H78SK,This one has a medium name,2016-02-12