Question

我想过滤所有与某些Attribut具有相同AttributValue的人作为另一个人我有以下查询：

SELECT
  p1.keyValue,
  p1.Displayname,
  p2.keyValue,
  p2.Displayname,
  p1.ImportantAttrName,
  p1.ImportantAttrValue
FROM Person p1 WITH (NOLOCK)
JOIN Person p2 WITH (NOLOCK)
  ON p1.ImportantAttr = p2.ImportantAttr
WHERE p1.keyValue != p2.keyValue
AND p1.ImportantAttrValue = p2.ImportantAttrValue

使用此查询，我将获得两次所有条目，因为每个人都将在p1和p2中。所以结果将如下所示：

I123    Freddy Krüger   A123    The Horsemen   Moviecategorie    Horror
A123    The Horsemen    I123    Freddy Krüger   Moviecategorie    Horror

但是出于分析的目的，如果我只能获得p1.keyvalue和p2.keyvalue的组合一次，而不考虑两个列中的值，那将是很好的。

到目前为止，我通过导出到excel并在那里进行清理来完成此操作，但是有没有办法解决查询问题而无法获得此问题＆＃34;重复＆＃34;？

Answer 1

使用where p1.keyValue < p2.keyValue：

SELECT
    p1.keyValue,
    p1.Displayname,
    p2.keyValue,
    p2.Displayname, 
    p1.ImportantAttrName,
    p1.ImportantAttrValue
FROM Person p1 WITH (NOLOCK)
INNER JOIN Person p2 WITH (NOLOCK)
    ON p1.ImportantAttr = p2.ImportantAttr
WHERE
    p1.keyValue < p2.keyValue AND       -- change is here
    p1.ImportantAttrValue = p2.ImportantAttrValue;

这将确保您不会看到重复的对。要在数字上理解其工作原理，请考虑两个关键值1和2。使用条件!=，1-2和2-1都符合该条件。但使用<只会产生1-2。

Answer 2

你可以转：

df2 = df.loc[:, pd.to_numeric(df.columns, errors='coerce') < 24]

为：

on p1.ImportantAttr = p2.ImportantAttr

整个查询可能如下所示：

on p1.ImportantAttr = p2.ImportantAttr and p1.keyValue < p2.keyValue

Answer 3

这可能是不同的方法，但可以得到预期。

使用分区计数（*）：

select count(*) over(partition by Attr) as RepeatCount, * from (
select keyValue,DisplayName,ImportantAttr + ' ' +ImportantAttrValue as Attr
  from tblTest) tblTemp

根据上面的查询，您将得到如下结果

> RepeatCount    keyValue     DisplayName          Attr
> 
> 1       P321        The Ironman          Generalcategorie Test 
> 2       I123        Freddy Krüger        Moviecategorie Horror 
> 2       A123        The Horsemen         Moviecategorie Horror

从此结果中，您可以按Repeatcount＆gt;过滤记录1

筛选SQL中列值的组合

3 个答案: