基于关联计算表中行之间相似性的最佳方法是什么?

时间:2010-06-13 01:15:23

标签: c# .net sql sql-server orm

假设每个人都有一组最喜欢的书籍。

所以我有一张表:

  • 人与书之间的关联(MxN的联合表)

我想根据喜欢的书籍重叠来获取与Person1相似的人物。那就是:他们共同拥有的书越多,它们就越相似。

我不必仅使用SQL来解决此问题。我也可以使用编程。我正在使用SQL Server 2008和C#。

专家会使用什么解决方案?

3 个答案:

答案 0 :(得分:2)

这可能不是最有效的,但它相对简单:

WITH SimlarBookPrefs(person_id, similar_person_id, booksInCommon) AS
(
 Select p1.person_id, p2.person_id AS simlar_person_id,   
 /* Find the number of books p1 and p2 have in common */
   (SELECT COUNT(*) FROM PersonBook pb1, PersonBook pb2 
     JOIN pb1=book_id=pb2.book_id
   WHERE pb1.person_id=p1.person_id AND pb2.person_id=p2.person_id) As BooksInCommon
   FROM Person p1 CROSS JOIN Person p2
)

这将为您提供每个人,其他人的名单和共同的数字书籍。

要获得最相似的人,请添加(在同一查询中)

SELECT TOP 1 similar_person_id FROM SimilarBookPrefs 
   WHERE person_id = <person_to_match>
   ORDER By booksInCommon DESC;

第一部分不一定是CTE(即WITH ...),它可以是视图甚至是派生表。为简洁起见,这是一个CTE。

答案 1 :(得分:1)

如果我在C#中这样做,我可能会像这样解决它

var query = from personBook in personBooks
            where personBook.PersonId != basePersonId // ID of person to match
            join bookbase in personBooks
            on personBook.BookId equals bookbase.BookId
            where bookbase.PersonId == basePersonId // ID of person to match
            join person in persons 
            on personBook.PersonId equals person.Id 
            group person by person into bookgroup
            select new
            {
                Person = bookgroup.Key, 
                BooksInCommon = bookgroup.Count()
            };

这可能是通过实体框架或Linq to SQL完成的,或者直接简单地翻译成SQL。

完整的示例代码

class CommonBooks
{
    static void Main()
    {
        List<Person> persons = new List<Person>()
        {
            new Person(1, "Jane"), new Person(2, "Joan"), new Person(3, "Jim"), new Person(4, "John"), new Person(5, "Jill")
        };

        List<Book> books = new List<Book>()
        {
            new Book(1), new Book(2), new Book(3), new Book(4), new Book(5)
        };

        List<PersonBook> personBooks = new List<PersonBook>()
        {
            new PersonBook(1,1), new PersonBook(1,2), new PersonBook(1,3), new PersonBook(1,4), new PersonBook(1,5), 
            new PersonBook(2,2), new PersonBook(2,3), new PersonBook(2,5), 
            new PersonBook(3,2), new PersonBook(3,4), new PersonBook(3,5), 
            new PersonBook(4,1), new PersonBook(4,4),
            new PersonBook(5,1), new PersonBook(5,3), new PersonBook(5,5)
        };

        int basePersonId = 4; // person to match likeness

        var query = from personBook in personBooks
                    where personBook.PersonId != basePersonId
                    join bookbase in personBooks
                    on personBook.BookId equals bookbase.BookId
                    where bookbase.PersonId == basePersonId
                    join person in persons
                    on personBook.PersonId equals person.Id
                    group person by person into bookgroup
                    select new
                    {
                        Person = bookgroup.Key,
                        BooksInCommon = bookgroup.Count()
                    };

        foreach (var item in query)
        {
            Console.WriteLine("{0}\t{1}", item.Person.Name, item.BooksInCommon);
        }

        Console.Read();
    }
}

class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public Person(int id, string name) { Id = id; Name = name; }
}

class Book
{
    public int Id { get; set; }
    public Book(int id) { Id = id; }
}

class PersonBook
{
    public int PersonId { get; set; }
    public int BookId { get; set; }
    public PersonBook(int personId, int bookId) { PersonId = personId; BookId = bookId; }
}

答案 2 :(得分:0)

您所描述的问题通常被称为“协作过滤”,并使用“推荐系统”进行处理。谷歌搜索这些术语之一应该会为您提供大量有用的信息。