如何从重复字段的数组中拆分字符串?

时间:2012-09-28 06:17:06

标签: bash shell sed awk

我有一个输入文件,语法如下:

"ID","Company Name","AccountManager","Product","Support Type","Country"

示例:

"1","Company one","Surname Name / Phone/ Cell Phone ","Product► (d2XXXXXX) ► Version","29.10.2012 ► Type of support","Singapore"

"2","Company two","Surname Name / Phone/ Cell Phone ","Product► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version","31.10.2012 ► Type of support\n28.10.2012 ► Type of support\nn/a ► Type of support","Indonesia"

"3","Company three","Surname Name / Phone/ Cell Phone ","Product► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version\nProduct► (d2XXXXXX) ► Version,"31.12.2012 ► Type of support\nType of support\nn\\a ► Type of support\n31.03.2013 ► Type of support\n25.10.2012 ► Type of support\nn\\a ► Type of support","USA"

第一家公司只有一种产品,第二家公司有3种产品 - 它们以\n(产品和支持类型)分开,第三家公司有6种产品。

在输出中,此字符串必须是独立的并重复列的值:

"ID","Company Name","AccountManager","Country",但"AccountManager"应该只有姓氏和名称,并且列支持类型应与今天的日期进行比较 - 如果支持类型的日期与今天的日期不同,则为27到32天的值列必须在输出文件中。如果我们在支持类型中有n/a,则应该错过。

输出应如下所示:

"1","Company one","Surname Name","Product► (d2XXXXXX) ► Version","29.10.2012","Singapore"
"2","Company two","Surname Name","Product► (d2XXXXXX) ► Version","28.10.2012","Indonesia"
"2","Company two","Surname Name","Product► (d2XXXXXX) ► Version","31.10.2012","Indonesia"
"3","Company three","Surname Name","Product► (d2XXXXXX) ► Version","25.10.2012","USA"

我怎样才能在bash中执行此操作?

1 个答案:

答案 0 :(得分:2)

您可以使用以下名为“products.awk”的AWK脚本获取它:

#/usr/bin/awk -f

BEGIN {
    FS=",";
    "date +\"%s\"" | getline curr_timestamp;
}

{
    split($3, account, "/");
    gsub(/ $/, "", account[1]);
    split($4, products, "\\\\n");
    split($5, supports, "\\\\n");
    for (i in products) {
        gsub("\"", "", products[i]);
        gsub("\"", "", supports[i]);
        split(supports[i], timesupport, " ");
        # ignore not available and support without date
        if (supports[i] !~ /n\\\\a*/ && supports[i] !~ /n\/a*/ && $2 !~ /\NULL/ && timesupport[1] ~ /[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9][0-9]/) {
            # formatting date
            split(timesupport[1], date, "\.");
            mydate = "date -d \""date[3]"/"date[2]"/"date[1]"\" \"+%s\"";
            # date to timestamp (using bash)
            mydate | getline timestamp;
            # timestamp is >= 27 days and <= 32 days
            if ((timestamp-curr_timestamp) >= 2332800 && (timestamp-curr_timestamp) <= 2764800)
                print $1","$2","account[1]"\",\""products[i]"\",\""supports[i]"\","$6;
        }
    }
}

假设您的数据位于名为data.txt的文件中,您可以使用以下行从bash调用此脚本:

awk -f products.awk data.txt

使用您的示例文件我得到了此输出运行脚本:

"1","Company one","Surname Name","Product► (d2XXXXXX) ► Version","29.10.2012 ► Type of support","Singapore"
"2","Company two","Surname Name","Product► (d2XXXXXX) ► Version","31.10.2012 ► Type of support","Indonesia"
"2","Company two","Surname Name","Product► (d2XXXXXX) ► Version","28.10.2012 ► Type of support","Indonesia"

修改

我只得到3行,因为最后一行不适合&gt; = 27&amp;&amp; &lt; = 32条件(今天是9月29日,你的问题是9月28日)。

最后我们明白了!