采用条件采样矩阵(无零或重复列)

时间:2016-12-14 18:46:18

标签: python r matrix sampling

如果您对问题的背景感兴趣,我正在思考如何解决this post - 顺便说一下,如果你在那里解决它,我只会删除这个问题。理想情况下,我想获得一个analytical or algebraic solution (constrained non-capturing rook problem),但我不喜欢模拟。顺便说一下,我发布的related question没有那么详细,以防更容易解决。

但是你不必离开这个页面。基本上有两个足球队名单的配对,有些配对很好,而其他配对则是规则禁止的。这是矩阵:

enter image description here

因此要生成多个采样以匹配行名称(左侧)上的团队与对方团队的列名(在顶部),我必须提出一个条件采样程序,但我不知道怎么样。

这是我到目前为止所尝试的:

BCN = c(0,2,3,4,0,0,7,8)
ATL = c(0,0,3,4,5,0,7,8)
DOR = c(0,0,3,4,5,6,7,0)
MON = c(1,2,3,0,5,6,7,0)
ARS = c(1,2,3,0,0,6,7,8)
LEI = c(1,2,3,4,0,6,0,8)
JUV = c(1,2,3,4,5,0,7,8)
NAP = c(1,2,0,4,5,6,7,8)

chessboard = t(as.matrix(data.frame(BCN, ATL, DOR, MON, ARS, LEI, JUV, NAP)))
colnames(chessboard) = c("MAD", "BYN", "BEN", "PSG", "MCY", "SEV", "OPO", "LEV")
chessboard
        MAD BYN BEN PSG MCY SEV OPO LEV
  BCN   0   2   3   4   0   0   7   8
  ATL   0   0   3   4   5   0   7   8
  DOR   0   0   3   4   5   6   7   0
  MON   1   2   3   0   5   6   7   0
  ARS   1   2   3   0   0   6   7   8
  LEI   1   2   3   4   0   6   0   8
  JUV   1   2   3   4   5   0   7   8
  NAP   1   2   0   4   5   6   7   8

match = function(){
vec = rep(0,8)
for(i in 1:8){
tryCatch({vec[i] = as.numeric(sample(as.character(chessboard[i,][!(chessboard[i,] %in% vec) & chessboard[i,] > 0]),1))
last=chessboard[8,][!(chessboard[8,] %in% vec) & chessboard[i,] > 0]
},error=function(e){})
}
vec
}
match()

set.seed(0)
nsim = 100000
matches = t(replicate(nsim, match()))
matches = subset(matches, matches[,8]!=0)
colnames(matches) = c("BCN", "ATL", "DOR", "MON", "ARS", "LEI", "JUV", "NAP")
head(matches)

table = apply(matches, 2, function(x) table(x)/nrow(matches))
table
$BCN
x
        2         3         4         7         8 
0.1969821 0.2125814 0.1967272 0.1967166 0.1969927 

$ATL
x
        3         4         5         7         8 
0.2016226 0.1874462 0.2357732 0.1875737 0.1875843 

$DOR
x
        3         4         5         6         7 
0.1773264 0.1686188 0.2097673 0.2787270 0.1655605 

$MON
x
        1         2         3         5         6         7 
0.2567882 0.2031199 0.1172017 0.1341921 0.1789617 0.1097365 

$ARS
x
        1         2         3         6         7         8 
0.2368882 0.1907169 0.1104480 0.1651358 0.1026112 0.1941999 

$LEI
x
        1         2         3         4         6         8 
0.2129743 0.1717302 0.1019210 0.1856410 0.1511081 0.1766255 

$JUV
x
         1          2          3          4          5          7          8 
0.15873252 0.12940289 0.07889902 0.14203948 0.22837179 0.12845781 0.13409648 

$NAP
x
        1         2         4         5         6         7         8 
0.1346168 0.1080481 0.1195272 0.1918956 0.2260675 0.1093436 0.1105011 

1 个答案:

答案 0 :(得分:1)

也许试试这个:

matches = setNames(as.list(rep(NA,8)), rownames(mat))
set.seed(1)    

# For each row, sample a column, then drop that column.
# 'sample.int' will automatically renormalize the probabilities.
for (i in sample.int(8)) {
  team_i = rownames(mat)[i]

  j = sample.int(ncol(mat), 1, prob=mat[i,])

  matches[[team_i]] = colnames(mat)[j]

  mat = mat[,-j,drop=FALSE]
}

> matches
# $Barcelona
# [1] "Oporto"
# 
# $Atletico
# [1] "Benfica"
# 
# $Dortmund
# [1] "Paris"
# 
# $Juventus
# [1] "City"
# 
# $Arsenal
# [1] "Sevilla"
# 
# $Napoli
# [1] "Leverkusen"
# 
# $Monaco
# [1] "Bayern"
# 
# $Leicester
# [1] "Madrid"

添加限制可能是一个好主意,这样你就不会得到一行零。