LINQ中对象的LEFT OUTER JOIN

时间:2010-02-25 07:09:34

标签: c# linq-to-objects

请考虑以下代码 CityCode加入City和CitPlace 我想做的是在CityPlace和City之间进行LEFT OUTER JOIN。

City[] cities = new City[]{
 new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
 new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
 new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
 new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
};


CityPlace[] places = new CityPlace[]{
 new CityPlace{CityCode="0771",Place="Shankar Nagar"},
 new CityPlace{CityCode="0771",Place="Pandari"},
 new CityPlace{CityCode="0771",Place="Energy Park"},

 new CityPlace{CityCode="0751",Place="Baadaa"},
 new CityPlace{CityCode="0751",Place="Nai Sadak"},
 new CityPlace{CityCode="0751",Place="Jayendraganj"},
 new CityPlace{CityCode="0751",Place="Vinay Nagar"},

 new CityPlace{CityCode="0755",Place="Idgah Hills"},

 new CityPlace{CityCode="022",Place="Parel"},
 new CityPlace{CityCode="022",Place="Haaji Ali"},
 new CityPlace{CityCode="022",Place="Girgaon Beach"},

 new CityPlace{CityCode="0783",Place="Railway Station"}};

我做的是

var res = places.GroupJoin(cities,
                           p1=>p1.CityCode,
                           c1=>c1.CityCode,
                           (p2,c2s)=>new {Place=p2.Place,
                                CityName=c2s.Count()==0 ? "NO NAME" 
                                           : c2s.First().CityName });

foreach(var v in res)
 Console.WriteLine(v);

这是标准还是快速而肮脏的解决方案?

3 个答案:

答案 0 :(得分:10)

你自己的答案很好,但不是很优雅。所以,是的,它有点脏。有一种执行左外连接的标准方法,它处理您的示例将处理存在重复城市的情况。您的示例无法处理重复的城市,因为当您选择c2s.First()时会忽略任何重复项。

标准左连接步骤如下:

  1. 使用GroupJoin从您的数据创建层次结构。
  2. 使用SelectMany展平层次结构。
  3. 您的GroupJoin通过忽略除第一个匹配城市之外的所有内容,一步缩小层次结构。这就是它的肮脏之处。如果您试图通过占用城市并将其加入地点来反向使用此代码,那么每个城市只能获得一个地方!这显然很糟糕。最好学习如何以正确的方式进行左连接,然后它将始终有效。

    如果您希望保留层次结构然后使用嵌套的foreach循环来显示它们,则步骤2中的SelectMany实际上是可选的,但我假设您希望以平面表格式显示数据。

    如果您只想查看具体问题的答案,请向下滚动到"城市和地点"以下是标题,但首先,这是一个使用两个简单字符串数组的完整示例。

    完整解释的抽象示例

    这是一个使用两个字母数组而不是代码的完整示例。我想首先展示一个更简单的例子。您可以将其复制并粘贴到LINQPad中,并将语言设置为" C#语句"如果你愿意的话,自己动手吧。我高度推荐LINQPad作为测试各种代码的工具,而不仅仅是LINQ。或者,您也可以在Visual Studio中创建控制台应用程序。

    这是没有太多评论的代码。下面是一个经过大量注释的版本。如果你想要准确了解每个参数的含义,你可能想要跳转到那里。

    var leftLetters = new string[]{ "A", "B", "C" };
    var rightLetters = new string[]{ "A", "B" };
    
    //Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.)
    var groupJoin =
        leftLetters.GroupJoin(
            rightLetters, 
            leftLetter => leftLetter, 
            rightLetter => rightLetter, 
            ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } 
        );
    
    //Flatten the groupJoin hierarchical collection with a SelectMany
    var selectMany = 
        groupJoin.SelectMany(           
            groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ),            
            ( groupJoinItem, rightLetter ) => new {
                LeftLetter = groupJoinItem.leftLetter, 
                RightLetter = rightLetter 
            }
        );
    
    //You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values.
    foreach( var row in selectMany )
    {
        Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
    }
    

    这里是输出,这应该是非常明显的,因为我们都知道左连接应该做什么。

    A, A
    B, B
    C, MISSING
    

    注释严重的版本:

    var leftLetters = new string[]{ "A", "B", "C" };
    var rightLetters = new string[]{ "A", "B" };
    
    //Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.)
    var groupJoin =
        leftLetters.GroupJoin(
            rightLetters, //inner: the right hand collection in the join
            leftLetter => leftLetter, //outerKeySelector: There is no property to use as they join key, the letter *is* the key. So this lambda simply returns the parameter itself.
            rightLetter => rightLetter, //innerKeySelector: Same with the rightLetters
            ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } //resultSelector: given an element from the left items, and its matching collection of right items, project them to some class. In this case we are using a new anonymous type. 
        );
    
    //Flatten the groupJoin hierarchical collection with a SelectMany
    var selectMany = 
        groupJoin.SelectMany(
            //collectionSelector: given a single element from our collection of group join items from above, provide a collection of its "right" items which we want to flatten out. In this case the right items are in a property of the groupJoinItem itself, but this does not need to be the case! We use DefaultIfEmpty to turn an empty collection into a new collection that has exactly one item instead: the string "MISSING".
            groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ), 
            //resultSelector: SelectMany does the flattening for us and this lambda gets invoked once for *each right item* in a given left item's collection of right items.
            ( 
                groupJoinItem, //The first parameter is one of the original group join item, including its entire collection of right items, but we will ignore that collection in the body of this lamda and just grab the leftLetter property.
                rightLetter //The second parameter is *one* of the matching right items from the collection of right items we selected in the first lambda we passed into SelectMany.
            )  
                => new {
                    LeftLetter = groupJoinItem.leftLetter, //groupJoinItem is one of the original items from the GroupJoin above. We just want the left letter from it.
                    RightLetter = rightLetter //This is one of the individual right letters, so just select it as-is.
                }
        );
    
    //You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values.
    foreach( var row in selectMany )
    {
        Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
    }   
    

    再次,输出参考:

    A, A
    B, B
    C, MISSING
    

    以上使用LINQ通常称为"方法链"。您可以使用一些集合并将方法链接在一起以获得所需的内容。 (大多数时候你不使用变量来保存单个表达式。你只需要进行GroupJoin(...)。SelectMany(...),这就是为什么它被称为"方法链"。它非常冗长和明确,需要很长时间才能写出来。

    相反,我们可以使用所谓的"理解","查询理解"或" LINQ理解"。理解是20世纪70年代以来的一个古老的计算机科学术语,对大多数人来说,老实说并没有多大意义。相反,人们会将它们称为" LINQ查询"或" LINQ表达式",但技术上也适用于方法链,因为在这两种情况下,您都在构建表达式树。 (表达式树超出了本教程的范围。)LINQ理解是一种类似SQL的语法,用于编写LINQ,但它不是SQL!它与实际的SQL无关。这是与查询理解相同的代码:

    var leftLetters = new string[]{ "A", "B", "C" };
    var rightLetters = new string[]{ "A", "B" };
    
    var query = 
        from leftLetter in leftLetters
        join rightLetter in rightLetters
        on leftLetter equals rightLetter into matchingRightLetters
        from rightLetter in matchingRightLetters.DefaultIfEmpty( "MISSING" )
        select new
        {
            LeftLetter = leftLetter,
            RightLetter = rightLetter
        };
    
    foreach( var row in query )
    {
        Console.WriteLine( row.LeftLetter + ", " + row.RightLetter );
    }   
    

    这将编译为 exact 与上面示例相同的代码,但参数名为" groupJoinItem"在SelectMany中将命名为" temp0"因为该参数在该代码的理解版本中没有明确存在。

    我认为你可以理解这个版本的代码有多简单。在进行左外连接时,我总是使用这种语法。我从未在SelectMany中使用GroupJoin。然而,乍一看它没什么意义。 join后跟into会创建一个GroupJoin。你首先必须知道这一点,以及你为什么要这样做。然后第二个from表示SelectMany,这是不明显的。当您有两个from关键字时,您实际上正在创建一个交叉连接(笛卡尔积),这就是SelectMany正在做的事情。 (排序。)

    例如,此查询:

    from leftLetter in leftLetters
    from rightLetter in rightLetters
    select new
    {
        LeftLetter = leftLetter,
        RightLetter = rightLetter
    }
    

    会产生:

    A, A
    A, B
    B, A
    B, B
    C, A
    C, B
    

    这是一个基本的交叉联接。

    所以,回到我们原来的左连接LINQ查询:查询的第一个from是组连接,第二个from表示每个groupJoinItem和它自己的匹配集合之间的交叉连接正确的信件。它有点像这样:

    from groupJoinItem in groupJoin
    from rightLetter in groupJoinItem.matchingRightLetters
    select new{...}
    

    事实上,我们实际上可以这样写它!

    var groupJoin = 
        from leftLetter in leftLetters
        join rightLetter in rightLetters
        on leftLetter equals rightLetter into matchingRightLetters
        select new 
        {
            LeftLetter = leftLetter,
            MatchingRightLetters = matchingRightLetters
        };
    
    
    var selectMany = 
        from groupJoinItem in groupJoin 
        from rightLetter in groupJoinItem.MatchingRightLetters.DefaultIfEmpty( "MISSING" )
        select new
        {
            LeftLetter = groupJoinItem.LeftLetter,
            RightLetter = rightLetter
        };
    

    selectMany表示以下内容:"对于groupJoin中的每个项目,将其与自己的MatchingRightLetters属性交叉连接,并将所有结果连接在一起。"这给出了与上面任何左连接代码完全相同的结果。

    对于这个简单的问题,这可能太过于解释,但我不喜欢货物崇拜节目(google it)。您应该知道完全您的代码正在做什么,以及为什么,否则您将无法解决更多难题。

    城市和地方

    所以,这是代码的方法链版本。它是一个完整的程序,所以人们可以在他们喜欢的情况下运行它(使用&#34; C#Program&#34; LINQPad中的语言类型或使用Visual Studio或C#编译器创建一个控制台应用程序。)< / p>

    void Main()
    {
        City[] cities = new City[]{
            new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
            new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
            new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
            new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
        };  
    
        CityPlace[] places = new CityPlace[]{
            new CityPlace{CityCode="0771",Place="Shankar Nagar"},
            new CityPlace{CityCode="0771",Place="Pandari"},
            new CityPlace{CityCode="0771",Place="Energy Park"},
    
            new CityPlace{CityCode="0751",Place="Baadaa"},
            new CityPlace{CityCode="0751",Place="Nai Sadak"},
            new CityPlace{CityCode="0751",Place="Jayendraganj"},
            new CityPlace{CityCode="0751",Place="Vinay Nagar"},
    
            new CityPlace{CityCode="0755",Place="Idgah Hills"},
    
            new CityPlace{CityCode="022",Place="Parel"},
            new CityPlace{CityCode="022",Place="Haaji Ali"},
            new CityPlace{CityCode="022",Place="Girgaon Beach"},
    
            new CityPlace{CityCode="0783",Place="Railway Station"}
        };
    
        var query = 
            places.GroupJoin(
                cities,
                place => place.CityCode,
                city => city.CityCode,
                ( place, matchingCities ) 
                    => new {
                        place,
                        matchingCities
                    }
            ).SelectMany(
                groupJoinItem => groupJoinItem.matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } ),
                ( groupJoinItem, city )
                    => new {
                        Place = groupJoinItem.place,
                        City = city
                    }
            );              
    
        foreach(var pair in query)
        {
            Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName );
        }
    }
    
    class City
    {
        public string CityCode;
        public string CityName;
        public string CityPopulation;
    }
    
    class CityPlace
    {
        public string CityCode;
        public string Place;
    }
    

    这是输出:

    Shankar Nagar: Raipur
    Pandari: Raipur
    Energy Park: Raipur
    Baadaa: Gwalior
    Nai Sadak: Gwalior
    Jayendraganj: Gwalior
    Vinay Nagar: Gwalior
    Idgah Hills: Bhopal
    Parel: Mumbai
    Haaji Ali: Mumbai
    Girgaon Beach: Mumbai
    Railway Station: NO NAME
    

    请注意,DefaultIfEmpty将返回实际City类的新实例,而不仅仅是字符串。这是因为我们将CityPlaces加入实际的City对象,而不是字符串。您可以使用DefaultIfEmpty()而不使用参数,并且您将获得一个null City for&#34; Railway Station&#34;,但是您必须检查您的空值foreach循环在调用pair.City.CityName之前。这是个人喜好的问题。

    这是使用查询理解的相同程序:

    void Main()
    {
        City[] cities = new City[]{
            new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"},
            new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"},
            new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"},
            new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"},
        };  
    
        CityPlace[] places = new CityPlace[]{
            new CityPlace{CityCode="0771",Place="Shankar Nagar"},
            new CityPlace{CityCode="0771",Place="Pandari"},
            new CityPlace{CityCode="0771",Place="Energy Park"},
    
            new CityPlace{CityCode="0751",Place="Baadaa"},
            new CityPlace{CityCode="0751",Place="Nai Sadak"},
            new CityPlace{CityCode="0751",Place="Jayendraganj"},
            new CityPlace{CityCode="0751",Place="Vinay Nagar"},
    
            new CityPlace{CityCode="0755",Place="Idgah Hills"},
    
            new CityPlace{CityCode="022",Place="Parel"},
            new CityPlace{CityCode="022",Place="Haaji Ali"},
            new CityPlace{CityCode="022",Place="Girgaon Beach"},
    
            new CityPlace{CityCode="0783",Place="Railway Station"}
        };
    
        var query = 
            from place in places
            join city in cities
            on place.CityCode equals city.CityCode into matchingCities
            from city in matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } )
            select new {
                Place = place,
                City = city
            };      
    
        foreach(var pair in query)
        {
            Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName );
        }
    }
    
    class City
    {
        public string CityCode;
        public string CityName;
        public string CityPopulation;
    }
    
    class CityPlace
    {
        public string CityCode;
        public string Place;
    }
    

    作为一个长期的SQL用户,我更喜欢查询理解版本。一旦您知道查询的各个部分的内容,其他人就可以更轻松地阅读代码的 intent

    快乐的节目!

答案 1 :(得分:8)

这是一个linq查询版本

var noCity = new City {CityName = "NO NAME"};
var anotherway = from p in places
                 join c in cities on p.CityCode equals c.CityCode into merge
                 from c in merge.DefaultIfEmpty(noCity)
                 select new { p.Place, c.CityName };

我认为使用DefaultIfEmpty()会让它更加清晰。

总而言之,我发现linq中的外连接非常令人困惑。这是我发现SQL查询显着优势的少数几个地方之一。

答案 2 :(得分:3)

在您的情况下,您没有对记录进行分组,因此请勿使用您的解决方案。您可以使用ScottS的解决方案或使用下面的查询。

var res = from p in places
                       select new
                       {
                           Place = p.Place,
                           CityName = (from c in cities where p.CityCode == c.CityCode select c.CityName).DefaultIfEmpty("NO NAME").ElementAtOrDefault(0)
                       };