我想知道是否有可能在保持常量行和列总和的同时改组4x4数据集。不可否认,我是编程的初学者,所以下面包含的代码可能并不容易。
任何帮助都将不胜感激,谢谢。
PS:如果你必须知道,数据集是基于种族的汽车偏好调查。
CarPreference <- read.table ( text = "
African 3 0 1 1
Asian 2 1 0 1
Hispanic 0 1 3 1
White 0 1 4 1
" )
row.names(CarPreference) <- CarPreference[,1]
colnames(CarPreference) <-c("Car Type","Car","Truck","SUV","Motorcycle")
CarPreference <- CarPreference[,-1]
as.matrix(CarPreference)
observed <- rbind(c(3,0,1,1),c(2,1,0,1),c(0,1,3,1),c(0,1,4,1))
deals=10000
observed.boot = array(NA,c(4,4,deals))
H0 <- c(rep(1,colSums(observed)[1]),rep(0,colSums(observed)[2]),rep(1,colSums(observed)[3]),rep(0,colSums(observed)[4]))
for (i in 1:deals)
{
data.boot <- sample(H0,sum(observed),replace=FALSE)
row1.boot <- data.boot[1:rowSums(observed)[1]]
row2.boot <- data.boot[(rowSums(observed)[1]+1):(rowSums(observed)[1]+rowSums(observed)[2])]
row3.boot <- data.boot[(rowSums(observed)[1]+rowSums(observed)[2]+1):(rowSums(observed)[1]+rowSums(observed)[2]+rowSums(observed)[3])]
row4.boot <- data.boot[(rowSums(observed)[1]+rowSums(observed)[2]+rowSums(observed)[3]+1):sum(observed)]
col1.boot <- data.boot[1:colSums(observed)[1]]
col2.boot <- data.boot[(colSums(observed)[1]+1):(colSums(observed)[1]+colSums(observed)[2])]
col3.boot <- data.boot[(colSums(observed)[1]+colSums(observed)[2]+1):(colSums(observed)[1]+colSums(observed)[2]+colSums(observed)[3])]
col4.boot <- data.boot[(colSums(observed)[1]+colSums(observed)[2]+colSums(observed)[3]+1):sum(observed)]
observed.boot[,,i] <- rbind(
c(sum(row1.boot),length(row1.boot)-sum(row1.boot), , ),
c(sum(row2.boot),length(row2.boot)-sum(row2.boot), , ),
c(sum(row3.boot),length(row3.boot)-sum(row3.boot), , ),
c(sum(row4.boot),length(row4.boot)-sum(row4.boot), , ))
}
答案 0 :(得分:1)
将其煮沸,您希望随机地移动观察的行标签,同时保持其列标签相同。您可以通过构建所有列索引的向量y
并反复对其进行混洗来执行此操作:
set.seed(144)
observed <- rbind(c(3,0,1,1),c(2,1,0,1),c(0,1,3,1),c(0,1,4,1))
x <- rep(1:nrow(observed), rowSums(observed))
y <- rep(1:ncol(observed), colSums(observed))
samples <- lapply(1:10000, function(a) table(x, sample(y)))
现在,samples
包含一个自举表列表,其行和列总和与observed
匹配。
samples[[1]]
# x 1 2 3 4
# 1 1 1 2 1
# 2 0 0 2 2
# 3 2 0 2 1
# 4 2 2 2 0
samples[[10000]]
# x 1 2 3 4
# 1 1 1 2 1
# 2 2 1 1 0
# 3 1 1 2 1
# 4 1 0 3 2
这与从原始表格的行和列总和相同的列联表中随机抽样相同。