想要读取和计算字段值==" TRUE"仅从第3场到第5场。
INPUT.TXT
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234
在读取从第3场到第5场的标题时,i,e A,B,C想要生成唯一的组合和排列,例如A,B,C,AB,AC,AB,ABC。 注意:不包括AA,BB,CC,BA等
如果" TRUE"被认为是" AB"组合计数则不应考虑" A" conut& " B"再次计算以避免重复...
实施例#1
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
运算#1
Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,,,,1
实施例#2
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,FALSE,ab1234
运算#2
Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,
实施例#3
Locationx,Desc,A,B,C,Locationy
ab123,Name1,FALSE,TRUE,FALSE,ab1234
运算#3
Desc,A,B,C,AB,AC,BC,ABC
Name1,,1,,,,,
期望的输出:
Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,2
Name2,,,1,,1,,1
Name3,1,,,2,,,
实际档案如下:
INPUT.TXT
Locationx,Desc,INCOMING,OUTGOING,SMS,RECHARGE,DEBIT,DATA,Locationy
ab123,Name1,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,ab1234
ab123,Name2,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,ab1234
尝试过很多,没有任何实现,请提出任何建议!!!
编辑:实际输入的所需输出:
Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,1,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,
没有Perl和Python访问!!!
答案 0 :(得分:2)
我已经编写了一个perl
脚本来为您完成此操作。从大小和注释中可以看出,完成此操作非常简单。
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use Algorithm::Combinatorics qw(combinations);
## change the file to the path where your file exists
open my $fh, '<', 'file';
my (%data, @new_labels);
## capture the header line in an array
my @header = split /,/, <$fh>;
## backup the header
my @fields = @header;
## remove first, second and last columns
@header = splice @header, 2, -1;
## generate unique combinations
for my $iter (1 .. +@header) {
my $combination = combinations(\@header, $iter);
while (my $pair = $combination->next) {
push @new_labels, "@$pair";
}
}
## iterate through rest of the file
while(my $line = <$fh>) {
my @line = split /,/, $line;
## identify combined labels that are true
my @is_true = map { $fields[$_] } grep { $line[$_] eq "TRUE" } 0 .. $#line;
## increment counter in hash map keyed at description and then new labels
++$data{$line[1]}{$_} for map { s/ /-/g; $_ } "@is_true";
}
## print the new header
print join ( ",", "Desc", map {s/ /-/g; $_} reverse @new_labels ) . "\n";
## print the description and counter values
for my $desc (sort keys %data){
print join ( ",", $desc, ( map { $data{$desc}{$_} //= "" } reverse @new_labels ) ) . "\n";
}
Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-DEBIT-DATA,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,,1,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,
注意:请重新访问您的预期输出。从上面脚本生成的输出中可以看出,它几乎没有错误。
答案 1 :(得分:1)
以下尝试使用awk
解决此问题:
script.awk
BEGIN { FS = OFS = "," }
function combinations(flds, itr, i, pre) {
for (i=++cnt; i<=numRecs; i++) {
++n
sep = ""
for (pre=1; pre<=itr; pre++) {
newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[pre]));
sep = "-"
}
newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[i])) ;
}
}
NR==1 {
for (fld=3; fld<NF; fld++) {
recs[++numRecs] = $fld
}
for (iter=0; iter<numRecs; iter++) {
combinations(recs, iter)
}
next
}
!seen[$2]++ { desc[++d] = $2 }
{
y = 0;
var = sep = ""
for (idx=3; idx<NF; idx++) {
if ($idx == "TRUE") {
is_true[++y] = recs[idx-2]
}
}
for (z=1; z<=y; z++) {
var = var sep sprintf ("%s", is_true[z])
sep = "-"
}
data[$2,var]++;
}
END{
printf "%s," , "Desc"
for (k=1; k<=n; k++) {
printf "%s%s", newRecs[k],(k==n?RS:FS)
}
for (name=1; name<=d; name++) {
printf "%s,", desc[name];
for (nR=1; nR<=n; nR++) {
printf "%s%s", (data[desc[name],newRecs[nR]]?data[desc[name],newRecs[nR]]:""), (nR==n?RS:FS)
}
}
}
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234
$ awk -f script.awk file
Desc,A,B,C,A-B,A-C,A-B-C
Name1,,,,1,,2
Name2,,,1,,1,1
Name3,1,,,2,,
现在,combination
函数中存在非常明显的错误。它不会递归打印所有组合。例如:对于A B C D
,它将打印
A
B
C
AB
AC
ABC
但不是BC