将TXT无序转换为csv

时间:2014-08-10 02:50:10

标签: csv awk sed

目前我有以下文本文件,即网站转储

    [61] Title1 
    subtitle1 
    1428 Elm Street, Springwood, Ohio 0812 
    Phone: (00) 0000 0000 [62] Email 

    [61] Title2 
    Subtitle2 
    1428 Elm Street, Springwood, Ohio 0812 
    Phone: (00) 0000 0000 [65] Email [66] Website 


    62 mailto: info@yyyyyyyyyy.com 
    65 mailto: mitchellstccc@xxxxxx.com 
    66 http://www.website.com

我需要转换csv文件,但将电子邮件替换为电子邮件和网站下面的值(如果有)。

    Title1, subtitle1, 1428 Elm Street, Springwood, Ohio 0812, (00) 0000 0000, Email
    Title2, subtitle2, 1428 Elm Street, Springwood, Ohio 0812, (00) 0000 0000, Email, http://www.website.com

我该如何完成这项任务?

我正在尝试使用awk,但我的awk-fu糟透了。 他们可以帮我一臂之力? (我不喜欢脚本或编程语言)

谢谢!

1 个答案:

答案 0 :(得分:1)

我会在2次传球中做到这一点,例如:

$ cat tst.awk
BEGIN {
    ARGV[ARGC] = ARGV[ARGC-1]; ARGC++
    RS = ""; FS = "\n"
}
NR==FNR {
    if (/^[[:digit:]]/) {
        for (i=1;i<=NF;i++) {
            key = val = $i
            sub(/[[:space:]].*/,"",key)
            sub(/[^[:space:]]+[[:space:]]+/,"",val)
            gsub(/ /,"",val)
            map["["key"]"] = val
        }
    }
    next
}
!/^[[:digit:]]/ {
    out = ""
    for (i=1;i<=NF;i++) {
        out = out sprintf("%s", (i>1?",":""))
        split($i,arr,/[[:space:]]+/)
        for (j=1;j in arr;j++) {
            if (arr[j] ~ /^\[.*\]$/) {
                if (arr[j] in map) {
                    arr[j+1] = map[arr[j]]
                    arr[j] = ","
                }
                else {
                    arr[j] = ""
                }
            }
            out = out sprintf("%s%s", (j>1?" ":""), arr[j])
        }
    }
    gsub(/[[:space:]]*,[[:space:]]*/,", ",out)
    print out
}

$ awk -f tst.awk file
 Title1, subtitle1, 1428 Elm Street, Springwood, Ohio 0812, Phone: (00) 0000 0000, mailto:info@yyyyyyyyyy.com
 Title2, Subtitle2, 1428 Elm Street, Springwood, Ohio 0812, Phone: (00) 0000 0000, mailto:mitchellstccc@xxxxxx.com, http://www.website.com

第一遍仅读取数字到电子邮件和站点值的映射,第二遍只处理替换[66] Website的地址块,并在第一遍中读取66的值。