$cat data.csv ID,City,Zip,Flag 1,A,95126,0 2,A,95126,1 3,A,95126,1 4,B,95124,0 5,B,95124,1 6,C,95124,0 7,C,95127,1 8,C,95127,0 9,C,95127,1
(a)其中" ID"以上是主键(唯一),
(b)每个" City"和" Zip"组合,最多有一个ID,其中Flag = 0;虽然它可以包含多个ID,其中Flag = 1,每个" City"和" Zip"组合
(c)标志可以是0或1
create table test(ID string, City String, Zip String, Flag int) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ; LOAD DATA LOCAL INPATH "./data.csv" OVERWRITE INTO TABLE test;
以下是预期结果:
ID,City,Zip,Flag 1,A,95126,0 2,A,95126,1 4,B,95124,0 5,B,95124,1 7,C,95127,1 8,C,95127,0
如何在Hive或Python中进行此配对的任何有价值的提示?
答案 0 :(得分:0)
试试这个。
select t2.*
FROM
test t1 INNER JOIN test t2
ON t1.City != t2.City
AND t1.Zip != t2.Zip
AND t1.Flag != t2.Flag
AND t1.ID<t2.ID