我想知道如何在R中设置一些基本匹配程序的例子。各种编程语言中有很多例子,但我还没有找到一个很好的例子。
假设我想让学生与项目相匹配,我会考虑在搜索此问题时遇到的3种替代方法:
1)Bipartite匹配案例:我要求每个学生指出要处理的3个项目(没有说明这3个项目中的任何偏好排名)。
ID T.1 T.2 T.3 T.4 T.5 T.6 T.7
1 1 1 1 0 0 0 0
2 0 0 0 0 1 1 1
3 0 1 1 1 0 0 0
4 0 0 0 1 1 1 0
5 1 0 1 0 1 0 0
6 0 1 0 0 0 1 1
7 0 1 1 0 1 0 0
-
d.1 <- structure(list(Student.ID = 1:7, Project.1 = c(1L, 0L, 0L, 0L,
1L, 0L, 0L), Project.2 = c(1L, 0L, 1L, 0L, 0L, 1L, 1L), Project.3 = c(1L,
0L, 1L, 0L, 1L, 0L, 1L), Project.4 = c(0L, 0L, 1L, 1L, 0L, 0L,
0L), Project.5 = c(0L, 1L, 0L, 1L, 1L, 0L, 1L), Project.6 = c(0L,
1L, 0L, 1L, 0L, 1L, 0L), Project.7 = c(0L, 1L, 0L, 0L, 0L, 1L,
0L)), .Names = c("Student.ID", "Project.1", "Project.2", "Project.3",
"Project.4", "Project.5", "Project.6", "Project.7"), class = "data.frame", row.names = c(NA,
-7L))
2)匈牙利算法:我要求每个学生名称3个项目在这些项目中进行处理,并在这些项目中列出偏好排名。据我所知,在这种情况下应用算法时的推理将是这样的:等级越高学生的“成本”越低。
ID T.1 T.2 T.3 T.4 T.5 T.6 T.7
1 3 2 1 na na na na
2 na na na na 1 2 3
3 na 1 3 2 na na na
4 na na na 1 2 3 na
5 2 na 3 na 1 na na
6 na 3 na na na 2 1
7 na 1 2 na 3 na na
-
d.2 <- structure(list(Student.ID = 1:7, Project.1 = structure(c(2L, 3L,
3L, 3L, 1L, 3L, 3L), .Label = c("2", "3", "na"), class = "factor"),
Project.2 = structure(c(2L, 4L, 1L, 4L, 4L, 3L, 1L), .Label = c("1",
"2", "3", "na"), class = "factor"), Project.3 = structure(c(1L,
4L, 3L, 4L, 3L, 4L, 2L), .Label = c("1", "2", "3", "na"), class = "factor"),
Project.4 = structure(c(3L, 3L, 2L, 1L, 3L, 3L, 3L), .Label = c("1",
"2", "na"), class = "factor"), Project.5 = structure(c(4L,
1L, 4L, 2L, 1L, 4L, 3L), .Label = c("1", "2", "3", "na"), class = "factor"),
Project.6 = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 3L), .Label = c("2",
"3", "na"), class = "factor"), Project.7 = structure(c(3L,
2L, 3L, 3L, 3L, 1L, 3L), .Label = c("1", "3", "na"), class = "factor")), .Names = c("Student.ID",
"Project.1", "Project.2", "Project.3", "Project.4", "Project.5",
"Project.6", "Project.7"), class = "data.frame", row.names = c(NA,
-7L))
3)???方法:这应该与(2)非常相关。但是,我认为它可能是更好/更公平的方法(至少在示例的设置中)。学生不能选择项目,他们甚至不了解项目,但是他们对某项技能组合评定了他们的资格(1“不存在”到10“专业级别”)。此外,讲师还为每个项目评定了所需的技能组合。除了(2)之外,第一步是计算相似性矩阵,然后从上面运行优化程序。
PS: Programming Skills
SK: Statistical Knowledge
IE: Industry Experience
ID PS SK IE
1 10 9 8
2 1 2 10
3 10 2 5
4 2 5 3
5 10 2 10
6 1 10 1
7 5 5 5
-
d.3a <- structure(list(Student.ID = 1:7, Programming.Skills = c(10L, 1L,
10L, 2L, 10L, 1L, 5L), Statistical.knowlegde = c(9L, 2L, 2L,
5L, 2L, 10L, 5L), Industry.Expertise = c(8L, 10L, 5L, 3L, 10L,
1L, 5L)), .Names = c("Student.ID", "Programming.Skills", "Statistical.knowlegde",
"Industry.Expertise"), class = "data.frame", row.names = c(NA,
-7L))
-
T: Topic ID
PS: Programming Skills
SK: Statistical Knowledge
IE: Industry Experience
T PS SK IE
1 10 5 1
2 1 1 5
3 10 10 10
4 2 8 3
5 4 3 2
6 1 1 1
7 5 7 2
-
d.3b <- structure(list(Project.ID = 1:7, Programming.Skills = c(10L,
1L, 10L, 2L, 4L, 1L, 5L), Statistical.Knowlegde = c(5L, 1L, 10L,
8L, 3L, 1L, 7L), Industry.Expertise = c(1L, 5L, 10L, 3L, 2L,
1L, 2L)), .Names = c("Project.ID", "Programming.Skills", "Statistical.Knowlegde",
"Industry.Expertise"), class = "data.frame", row.names = c(NA,
-7L))
我很感激在R中实施这3种方法的任何帮助。谢谢。
更新: 以下问题似乎是相关的,但没有一个显示如何在R中解决它: https://math.stackexchange.com/questions/132829/group-membership-assignment-by-preferences-optimization-problem https://superuser.com/questions/467577/using-optimization-to-assign-by-preference
答案 0 :(得分:2)
以下是使用二分匹配和匈牙利算法的可能解决方案。
我提出的使用二分匹配的解决方案可能不是您想到的。下面的所有代码都是对指定的迭代次数进行随机抽样,之后至少可以确定一个解决方案。这可能需要大量迭代并且需要很长时间才能出现大问题。下面的代码在200次迭代中找到了三个解决你的示例问题的方法。
matrix1 <- matrix(c( 1, 1, 1, NA, NA, NA, NA,
NA, NA, NA, NA, 1, 1, 1,
NA, 1, 1, 1, NA, NA, NA,
NA, NA, NA, 1, 1, 1, NA,
1, NA, 1, NA, 1, NA, NA,
NA, 1, NA, NA, NA, 1, 1,
NA, 1, 1, NA, 1, NA, NA), nrow=7, byrow=TRUE)
set.seed(1234)
iters <- 200
my.match <- matrix(NA, nrow=iters, ncol=ncol(matrix1))
for(i in 1:iters) {
for(j in 1:nrow(matrix1)) {
my.match[i,j] <- sample(which(matrix1[j,] == 1), 1)
}
}
n.unique <- apply(my.match, 1, function(x) length(unique(x)))
my.match[n.unique==ncol(matrix1),]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 3 7 4 6 1 2 5
# [2,] 1 7 4 5 3 6 2
# [3,] 3 5 4 6 1 7 2
以下是使用包clue
和solve_LSAP()
的匈牙利语算法的代码,如@jackStinger建议的那样。为了实现这个目的,我不得不更换缺失的观察结果,我随意用4替换它们.5人没有得到他们的第一选择而7人没有得到他们的三个选择中的任何一个。
library(clue)
matrix1 <- matrix(c( 3, 2, 1, 4, 4, 4, 4,
4, 4, 4, 4, 1, 2, 3,
4, 1, 3, 2, 4, 4, 4,
4, 4, 4, 1, 2, 3, 4,
2, 4, 3, 4, 1, 4, 4,
4, 3, 4, 4, 4, 2, 1,
4, 1, 2, 4, 3, 4, 4), nrow=7, byrow=TRUE)
matrix1
solve_LSAP(matrix1, maximum = FALSE)
# Optimal assignment:
# 1 => 3, 2 => 5, 3 => 2, 4 => 4, 5 => 1, 6 => 7, 7 => 6
以下链接指向匈牙利算法如何运作的网站:http://www.wikihow.com/Use-the-Hungarian-Algorithm
编辑:2014年6月5日
这是我第一次尝试优化第三种情况。我将每个学生随机分配到一个项目,然后计算该组作业的成本。通过查找学生技能组合与项目所需技能之间的差异来计算成本。将这些差异的绝对值相加,得出七项任务的总成本。
下面我重复这个过程10,000次,并确定这10,000个任务中的哪一个导致最低成本。
另一种方法是对所有可能的学生项目作业进行详尽的搜索。
随机搜索和穷举搜索都不是您想到的。然而,前者将给出近似的最优解,而后者将给出精确的最优解。
我稍后可能会回到这个问题。
students <- matrix(c(10, 9, 8,
1, 2, 10,
10, 2, 5,
2, 5, 3,
10, 2, 10,
1, 10, 1,
5, 5, 5), nrow=7, ncol=3, byrow=TRUE)
projects <- matrix(c(10, 5, 1,
1, 1, 5,
10, 10, 10,
2, 8, 3,
4, 3, 2,
1, 1, 1,
5, 7, 2), nrow=7, ncol=3, byrow=TRUE)
iters <- 10000
# col = student, cell = project
assignments <- matrix(NA, nrow=iters, ncol=nrow(students))
for(i in 1:iters) {
assignments[i,1:7] <- sample(7,7,replace=FALSE)
}
cost <- matrix(NA, nrow=iters, ncol=nrow(students))
for(i in 1:iters) {
for(j in 1:nrow(students)) {
student <- j
project <- assignments[i,student]
student.cost <- rep(NA,3)
for(k in 1:3) {
student.cost[k] <- abs(students[student,k] - projects[project,k])
}
cost[i,j] <- sum(student.cost)
}
}
total.costs <- rowSums(cost)
assignment.costs <- cbind(assignments, total.costs)
head(assignment.costs)
assignment.costs[assignment.costs[,8]==min(assignment.costs[,8]),]
# total.costs
# [1,] 3 2 1 4 5 6 7 48
# [2,] 3 2 1 6 5 4 7 48
# [3,] 3 2 1 6 5 4 7 48
# student 1, project 3, cost = 3
# student 2, project 2, cost = 6
# student 3, project 1, cost = 7
# student 4, project 4, cost = 3
# student 5, project 5, cost = 15
# student 6, project 6, cost = 9
# student 7, project 7, cost = 5
3+6+7+3+15+9+5
# [1] 48
编辑:2014年6月6日
这是详尽的搜索。只有5040种方法可以为七名学生分配项目。此搜索返回四个最佳解决方案:
students <- matrix(c(10, 9, 8,
1, 2, 10,
10, 2, 5,
2, 5, 3,
10, 2, 10,
1, 10, 1,
5, 5, 5), nrow=7, ncol=3, byrow=TRUE)
projects <- matrix(c(10, 5, 1,
1, 1, 5,
10, 10, 10,
2, 8, 3,
4, 3, 2,
1, 1, 1,
5, 7, 2), nrow=7, ncol=3, byrow=TRUE)
library(combinat)
n <- nrow(students)
assignments <- permn(1:n)
assignments <- do.call(rbind, assignments)
dim(assignments)
# column of assignments = student
# row of assignments = iteration
# cell of assignments = project
cost <- matrix(NA, nrow=nrow(assignments), ncol=n)
for(i in 1:(nrow(assignments))) {
for(student in 1:n) {
project <- assignments[i,student]
student.cost <- rep(NA,3)
for(k in 1:3) {
student.cost[k] <- abs(students[student,k] - projects[project,k])
}
cost[i,student] <- sum(student.cost)
}
}
total.costs <- rowSums(cost)
assignment.costs <- cbind(assignments, total.costs)
head(assignment.costs)
assignment.costs[assignment.costs[,(n+1)]==min(assignment.costs[,(n+1)]),]
total.costs
[1,] 3 2 5 4 1 6 7 48
[2,] 3 2 5 6 1 4 7 48
[3,] 3 2 1 6 5 4 7 48
[4,] 3 2 1 4 5 6 7 48