awk只计算选择性组合:

时间:2014-08-16 16:27:29

标签: awk

想要读取和计算字段值==" TRUE"仅从第3场到第5场。

INPUT.TXT

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234

在读取从第3场到第5场的标题时,i,e A,B,C想要生成唯一的组合和排列,例如A,B,C,AB,AC,AB,ABC。 注意:不包括AA,BB,CC,BA等

如果" TRUE"被认为是" AB"组合计数则不应考虑" A" conut& " B"再次计算以避免重复...

实施例#1

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234

运算#1

Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,,,,1

实施例#2

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,FALSE,ab1234

运算#2

Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,

实施例#3

Locationx,Desc,A,B,C,Locationy
ab123,Name1,FALSE,TRUE,FALSE,ab1234

运算#3

Desc,A,B,C,AB,AC,BC,ABC
Name1,,1,,,,,

期望的输出:

Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,2
Name2,,,1,,1,,1
Name3,1,,,2,,,

实际档案如下:

INPUT.TXT

Locationx,Desc,INCOMING,OUTGOING,SMS,RECHARGE,DEBIT,DATA,Locationy
ab123,Name1,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,ab1234
ab123,Name2,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,ab1234

尝试过很多,没有任何实现,请提出任何建议!!!

编辑:实际输入的所需输出:

Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,1,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,

没有Perl和Python访问!!!

2 个答案:

答案 0 :(得分:2)

我已经编写了一个perl脚本来为您完成此操作。从大小和注释中可以看出,完成此操作非常简单。

#!/usr/bin/perl 

use strict;
use warnings; 
use autodie;
use Algorithm::Combinatorics qw(combinations);

## change the file to the path where your file exists
open my $fh, '<', 'file';

my (%data, @new_labels);

## capture the header line in an array
my @header = split /,/, <$fh>;

## backup the header
my @fields = @header;

## remove first, second and last columns
@header = splice @header, 2, -1;

## generate unique combinations
for my $iter (1 .. +@header) {
    my $combination = combinations(\@header, $iter);
    while (my $pair = $combination->next) {
        push @new_labels, "@$pair";
    }
}

## iterate through rest of the file
while(my $line = <$fh>) {
    my @line = split /,/, $line;

    ## identify combined labels that are true
    my @is_true = map { $fields[$_] } grep { $line[$_] eq "TRUE" } 0 .. $#line;

    ## increment counter in hash map keyed at description and then new labels
    ++$data{$line[1]}{$_} for map { s/ /-/g; $_ } "@is_true";
}

## print the new header
print join ( ",", "Desc", map {s/ /-/g; $_} reverse @new_labels ) . "\n";

## print the description and counter values
for my $desc (sort keys %data){     
    print join ( ",", $desc, ( map { $data{$desc}{$_} //= "" } reverse @new_labels ) ) . "\n";
}

输出:

Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-DEBIT-DATA,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,,1,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,

注意:请重新访问您的预期输出。从上面脚本生成的输出中可以看出,它几乎没有错误。

答案 1 :(得分:1)

以下尝试使用awk解决此问题:

script.awk

的内容
BEGIN { FS = OFS = "," }

function combinations(flds, itr,    i, pre) {
    for (i=++cnt; i<=numRecs; i++) {
        ++n
        sep = ""
        for (pre=1; pre<=itr; pre++) {
            newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[pre]));
            sep = "-"
        }
        newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[i])) ;
    }
}

NR==1 {
    for (fld=3; fld<NF; fld++) {
        recs[++numRecs] = $fld
    }
    for (iter=0; iter<numRecs; iter++) {
        combinations(recs, iter)
    }
    next
}

!seen[$2]++ { desc[++d] = $2 }

{
    y = 0;
    var = sep = ""
    for (idx=3; idx<NF; idx++) {
        if ($idx == "TRUE") {
            is_true[++y] = recs[idx-2]
        }
    }
    for (z=1; z<=y; z++) {
        var = var sep sprintf ("%s", is_true[z])
        sep = "-"
    }
    data[$2,var]++;
}
END{
     printf "%s," , "Desc"
    for (k=1; k<=n; k++) {
        printf "%s%s", newRecs[k],(k==n?RS:FS)
    }
    for (name=1; name<=d; name++) {
        printf "%s,", desc[name];
        for (nR=1; nR<=n; nR++) {
            printf "%s%s", (data[desc[name],newRecs[nR]]?data[desc[name],newRecs[nR]]:""), (nR==n?RS:FS)
        }
    }
}

示例文件

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234

执行:

$ awk -f script.awk file
Desc,A,B,C,A-B,A-C,A-B-C
Name1,,,,1,,2
Name2,,,1,,1,1
Name3,1,,,2,,

现在,combination函数中存在非常明显的错误。它不会递归打印所有组合。例如:对于A B C D,它将打印

A
B
C
AB
AC
ABC

但不是BC