输入文件来自group_concatenated SQL查询输出,其中存在一些重复值。它已经与DISTINCT一起使用,但这还不够,因为只有一些子串是相同的。
所以,我感兴趣的专栏是第9栏。 我们的想法是,只在一行打印非重复的IAB类别。
该文件中的示例:
148422,0.72499999999999998,0.72499999999999998,0.72500000000165021,wpolityce.pl,300x250,standard,3,"IAB3;IAB11;IAB17;IAB12;IAB9;IAB15;IAB23,IAB3;IAB11;IAB17;IAB12;IAB9;IAB13;IAB23,IAB3;IAB11;IAB12;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,728x90,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x100,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x200,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x300,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB13;IAB23,IAB3;IAB11;IAB12;IAB9"
我想删除重复的IAB类别,因此对于第一行,它将如下所示:
148422,0.72499999999999998,0.72499999999999998,0.72500000000165021,wpolityce.pl,300x250,standard,3,"IAB3;IAB11;IAB17;IAB12;IAB9;IAB15;IAB23;IAB13;IAB23"
在我的SQL查询中,我有类似这样的内容:
SELECT GROUP_CONCAT(DISTINCT foo) FROM t;
现在foo-column可以包含这些行的值:
foo
bar
qrr
foo;bar
foo;qrr
foo
foo;qrr
bar
qrr
foo
使用DISTINCT
连接这些值将删除所有直接重复项。分开,如下:
foo
bar
qrr
foo;bar
foo;qrr
我对个人价值(foo
,bar
和qrr
)感兴趣。如果用于连接的分隔符为;
,则看起来好像并非所有重复项都被删除。
与;
连接后该列中的最终输出应为:
foo;bar;baz
如何删除这些副本?
我试着去做,但是我在AWK等方面并不是那么先进。
我正在和Bash合作,虽然我也可以在SQLite中“提前一步”。
答案 0 :(得分:1)
只要要处理的列始终是双引号中的唯一一个,并且可以用分号替换所有分隔符,这将按照您的要求执行
use strict;
use warnings 'all';
use List::Util 'uniq';
while ( <> ) {
s{ " ([^"]+) " }{ '"' . join(';', uniq $1 =~ /\w+/g) . '"' }ex;
print;
}
148422,0.72499999999999998,0.72499999999999998,0.72500000000165021,wpolityce.pl,300x250,standard,3,"IAB3;IAB11;IAB17;IAB12;IAB9;IAB15;IAB23;IAB13"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,728x90,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x100,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x200,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x300,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
答案 1 :(得分:-1)
template<int L>
class FP {
public:
int n;
template<int K>
FP<L+K> add(FP<K> a) {
FP<L+K> r;
r.n = n+a.n;
return r;
}
template<int K> int addS(FP<K> a) {
return L+K;
}
};
int main()
{
FP<1> n1;
FP<2> n2;
FP<n1.addS(n2)> n3 = n1.add(n2);
}
答案 2 :(得分:-1)
$ awk '
BEGIN { FS=OFS="\"" }
{
split($2,iabs,/[,;]/)
tmp = ""
delete seen
for (i=1;i in iabs;i++) {
if (!seen[iabs[i]]++) {
tmp = (tmp ? "" : tmp ";") iabs[i]
}
}
$2 = tmp
}
1
' file
148422,0.72499999999999998,0.72499999999999998,0.72500000000165021,wpolityce.pl,300x250,standard,3,"IAB3;IAB11;IAB17;IAB12;IAB9;IAB15;IAB23;IAB13"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,728x90,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x100,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x200,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"
118243,0.72499999999999998,0.72499999999999998,0.72500000000058573,wpolityce.pl,750x300,standard,3,"IAB3;IAB11;IAB1;IAB12;IAB13;IAB23;IAB9"