假设每个人都有一组最喜欢的书籍。
所以我有一张表:
我想根据喜欢的书籍重叠来获取与Person1相似的人物。那就是:他们共同拥有的书越多,它们就越相似。
我不必仅使用SQL来解决此问题。我也可以使用编程。我正在使用SQL Server 2008和C#。
专家会使用什么解决方案?
答案 0 :(得分:2)
这可能不是最有效的,但它相对简单:
WITH SimlarBookPrefs(person_id, similar_person_id, booksInCommon) AS
(
Select p1.person_id, p2.person_id AS simlar_person_id,
/* Find the number of books p1 and p2 have in common */
(SELECT COUNT(*) FROM PersonBook pb1, PersonBook pb2
JOIN pb1=book_id=pb2.book_id
WHERE pb1.person_id=p1.person_id AND pb2.person_id=p2.person_id) As BooksInCommon
FROM Person p1 CROSS JOIN Person p2
)
这将为您提供每个人,其他人的名单和共同的数字书籍。
要获得最相似的人,请添加(在同一查询中)
SELECT TOP 1 similar_person_id FROM SimilarBookPrefs
WHERE person_id = <person_to_match>
ORDER By booksInCommon DESC;
第一部分不一定是CTE(即WITH ...),它可以是视图甚至是派生表。为简洁起见,这是一个CTE。
答案 1 :(得分:1)
如果我在C#中这样做,我可能会像这样解决它
var query = from personBook in personBooks
where personBook.PersonId != basePersonId // ID of person to match
join bookbase in personBooks
on personBook.BookId equals bookbase.BookId
where bookbase.PersonId == basePersonId // ID of person to match
join person in persons
on personBook.PersonId equals person.Id
group person by person into bookgroup
select new
{
Person = bookgroup.Key,
BooksInCommon = bookgroup.Count()
};
这可能是通过实体框架或Linq to SQL完成的,或者直接简单地翻译成SQL。
完整的示例代码
class CommonBooks
{
static void Main()
{
List<Person> persons = new List<Person>()
{
new Person(1, "Jane"), new Person(2, "Joan"), new Person(3, "Jim"), new Person(4, "John"), new Person(5, "Jill")
};
List<Book> books = new List<Book>()
{
new Book(1), new Book(2), new Book(3), new Book(4), new Book(5)
};
List<PersonBook> personBooks = new List<PersonBook>()
{
new PersonBook(1,1), new PersonBook(1,2), new PersonBook(1,3), new PersonBook(1,4), new PersonBook(1,5),
new PersonBook(2,2), new PersonBook(2,3), new PersonBook(2,5),
new PersonBook(3,2), new PersonBook(3,4), new PersonBook(3,5),
new PersonBook(4,1), new PersonBook(4,4),
new PersonBook(5,1), new PersonBook(5,3), new PersonBook(5,5)
};
int basePersonId = 4; // person to match likeness
var query = from personBook in personBooks
where personBook.PersonId != basePersonId
join bookbase in personBooks
on personBook.BookId equals bookbase.BookId
where bookbase.PersonId == basePersonId
join person in persons
on personBook.PersonId equals person.Id
group person by person into bookgroup
select new
{
Person = bookgroup.Key,
BooksInCommon = bookgroup.Count()
};
foreach (var item in query)
{
Console.WriteLine("{0}\t{1}", item.Person.Name, item.BooksInCommon);
}
Console.Read();
}
}
class Person
{
public int Id { get; set; }
public string Name { get; set; }
public Person(int id, string name) { Id = id; Name = name; }
}
class Book
{
public int Id { get; set; }
public Book(int id) { Id = id; }
}
class PersonBook
{
public int PersonId { get; set; }
public int BookId { get; set; }
public PersonBook(int personId, int bookId) { PersonId = personId; BookId = bookId; }
}
答案 2 :(得分:0)
您所描述的问题通常被称为“协作过滤”,并使用“推荐系统”进行处理。谷歌搜索这些术语之一应该会为您提供大量有用的信息。